news

after google spent 7 years on a moon-landing project and failed, alphabet ceo asked: do machines have to be like humans?

2024-09-16

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

hans peter brondmo, former ceo of alphabet, joined google in 2016 to develop robots. at that time, social media and mobile internet were the hottest topics, and embodied intelligence seemed like a castle in the air: everyone yearned for it, but it was too far away.

even brondmo himself did not join the team for any big project. instead, google acquired nine robotics companies and he had to place the people and technology of these companies.

these robotics engineers were eventually integrated into the google x lab, known as the "moon landing factory."

the "moon landing factory" has been deeply engaged in the field of robotics for seven years. later, large language models appeared, and the future of embodied intelligence was more dazzling than ever. however, google decided to close everyday robots, a project that was highly anticipated in this laboratory.

recently, brondmo wrote a long article for wired magazine, reviewing his experience at google and a question he has been thinking about for a long time: do machines have to be like "humans"?

the following is brondmo’s self-narration, with some edited and abridged content.

robots are harder than "landing on the moon"

google x was founded in 2010 with a big idea: that google could solve the world's toughest problems.

x was purposely located in a separate building a few miles away from the office complex to foster its own culture where people can think outside the box.

we put a lot of effort into encouraging our members to take risks, experiment quickly, and even “celebrate failure” becausefailure simply means that the goals we set were extremely ambitious.

when i joined, the lab had already spawned waymo, google glass, and other science fiction-sounding projects, like flying energy windmills and balloons that tap into the stratosphere to provide internet to underserved areas.

what distinguishes project x from other silicon valley startups is that x members are encouraged to think big and long-term. in fact, x has a “formula” for determining whether a project is a moonshot.

first, the project needs to demonstrate that the problem it solves affects hundreds of millions or even billions of people. second, there must be a breakthrough technology that gives us a new way to solve the problem. finally, there needs to be a radical business or product solution that sounds just on the edge of crazy but not completely unfeasible.

giving ai a body

it’s hard to imagine anyone better suited to run x than astro teller, whose title literally translates to “captain moonshot.” ​​in the google x building—a three-story converted department store—you can always see astro wearing his trademark roller skates.

add in his ponytail, his ever-friendly smile and, of course, the name “astro,” and you might feel like you’ve stepped into an episode of the hbo series silicon valley.

when astro and i first sat down to discuss what we should do with the robotics company google had acquired, we agreed that we should take action, but what?

until now, most useful robots have been large, unintelligent and dangerous, confined to factories and warehouses and requiring close supervision or lockup to protect people from harm.how can we create robots that are both helpful and safe in everyday environments?this requires a new approach.

in other words, we want to give ai a body in the physical world, and if there is one place where projects of this scale can be conceived, i am sure it is x.

it will take a long time, a lot of patience, and a lot of trying crazy ideas and failing at many attempts.it will require major breakthroughs in ai and robotics, and will likely cost billions (yes, billions.)

the convergence of ai and robotics is inevitable, and we feel that many things that have only existed in science fiction until now will soon become reality.

it’s really hard

about once a week, i talk on the phone with my mother, and her opening question is always the same: “when are the robots coming?”

she doesn’t even say hello, she just wants to know when our robots will come to help her. i reply, “it’ll take a while, mom.” and she’ll say, “they better hurry!”

Hans Peter Brondmo

my mother lives in oslo, norway, which has excellent public health care; caregivers come to her apartment three times a day to help her with a range of tasks and chores, mostly related to her advanced parkinson's disease.

while these caregivers have enabled her to live independently in her own home, my mother hopes that a robot can help her with the little things that are now too awkward to handle, or just provide her with an arm to lean on from time to time.

"you know that robotics is a systems problem, right?" jeff asked me with a probing look. jeff bingham has a doctorate in bioengineering and is a thin, serious guy who grew up on a farm and is known for knowing almost everything.

an important point jeff was trying to make is thata robot is a very complex system and its overall performance depends on its weakest link.

for example, if the subsystem responsible for vision has difficulty perceiving objects in front of it in direct sunlight, then when sunlight shines through a window, the robot may suddenly become "blind" and stop working.

or if the navigation system doesn't understand stairs, the robot could fall down the stairs and injure itself and possibly an innocent bystander. building a robot that can live and work with us is hard. really, really hard.

for decades, people have tried to program various forms of robots to perform even simple tasks, such as grabbing a cup on a table or opening a door, but these programs always end up being extremely fragile and fail at the slightest change in the environment.

once you start thinking about it all, you realize that unless you have everything locked down so that it’s all in fixed, preset positions and the lighting is just right and never changes, simply picking up a green apple and putting it in the glass bowl on your kitchen table becomes an almost impossible puzzle to solve — which is why factory robots are locked up so that everything from the lighting to the placement of the items they’re working on is predictable and they don’t have to worry about bumping into people.

the real world lacks predictability, just like that beam of sunlight.and we haven’t even touched on the really hard parts, like moving around in the cluttered spaces where we live and work.

how to understand learning robots

but apparently, you only need 17 machine learning experts.

at least that’s what larry page told me, and it’s one of his classic, hard-to-understand insights.

i tried to argue that we couldn’t possibly build the hardware and software infrastructure with just a small group of machine learning researchers.

he waved his hand dismissively. "you only need 17."

i'm confused. why not 11? or 23? i must be missing something.

ultimately, there are two main approaches to applying ai in robotics. the first is a hybrid approach, where different parts of the system are powered by ai and then stitched together through programming.

in this way, the vision subsystem might use ai to recognize and categorize the world it sees. once it creates a list of the objects it sees, the robot program takes that list and uses methods in the code to act on it.

for example, if the program is written to pick up that apple from the table, the ai-driven vision system will detect the apple, and the program will then select "type: apple" from the list and use the control software to tell the robot to reach out and pick it up.

another approach is end-to-end learning (e2e), which attempts to learn the entire task, such as "pick up an object" or even more comprehensive efforts, such as "tidy up the table." the learning process is achieved by exposing the robot to large amounts of training data - much like the way a human learns to perform a physical task.

if you ask a young child to pick up a cup, they may need to learn what a cup is, understand that it can contain liquid, and, while playing with the cup, repeatedly tip it over, spilling a lot of milk in the process. but through demonstration, imitation, and lots of playful practice, they will eventually learn to do it—eventually without even having to think about the steps.

i gradually understood.until we conclusively demonstrate that a robot can learn to perform a task end-to-end, nothing else will matter.only then will we have a real chance of having robots reliably perform these tasks in the messy and unpredictable real world, qualifying us as a true moonshot.

this is not about the number "17", but aboutbig breakthroughs require small teams, rather than an army of engineers. obviously, a robot is more than just its ai brain, we still need to design and build a physical robot.

however, it is clear that a successful end-to-end mission would convince us (in the language of the moon program) that we can escape the pull of earth’s gravity.

one-armed robot

peter pastor is a german roboticist who earned his phd in robotics from the university of southern california. in the rare time he is not working, peter tries to keep up with his girlfriend, who goes kite surfing. in the lab, he spends most of his time controlling 14 proprietary robotic arms, which have since been replaced by seven industrial kuka arms, a configuration we call a "robot farm."

the arms run around the clock, constantly trying to pick up objects from a box, such as sponges, legos, rubber ducks, or plastic bananas. initially, they are programmed to move their claw grippers into the box from a random position above the box, close the gripper, pull up, and then see if they have caught anything. a camera above the box captures the object in the box, the arm's movements, and whether it was successful.

this training lasted for several months. at first, the robot had only a 7% success rate. but every time the robot succeeded, it received positive reinforcement. for the robot, this basically means that the so-called "weights" in the neural network will adjust according to various outcomes to positively reinforce desired behaviors and negatively reinforce undesirable behaviors. eventually, these robotic arms learned to successfully grasp objects more than 70% of the time.

one day, peter showed me a video of a robotic arm not only accurately grasping a yellow lego brick, but also pushing other objects out of the way to get a clearer grasping angle.

i knew this marked a real turning point: the arm had not been explicitly programmed to perform this action using traditional heuristics.it is acquired through learning

but even so—seven robots taking months to learn how to grab a rubber duck? that wasn’t enough. even a few hundred robots, practicing for years, wasn’t enough to teach them to perform their first useful, real-world task. so we built a cloud-based simulator and created more than 240 million virtual robot instances in 2021.

think of the simulator as a giant video game, with a realistic physics model that's realistic enough to simulate the weight of an object or the friction of a surface.

thousands of virtual robots use their virtual camera input and virtual bodies (modeled after real robots) to perform tasks, such as picking up a cup from a table.

they run simultaneously, trying and failing millions of times, collecting data to train the ai ​​algorithms. once the robots perform well enough in simulation, the algorithms are transferred to physical robots for final real-world training so that they can implement the newly learned actions.

i always think of this simulation process as a robot dreaming all night and then waking up having learned something new.

the problem is the data, idiot.

when we first woke up to chatgpt, it seemed like magic. an ai-powered system could write complete paragraphs, answer complex questions, and form sustained conversations. but at the same time, we also understood its fundamental limitations: to achieve this, it takes a lot of data.

robots already use large language models to understand commands and visual models to understand what they see, which makes their demo videos on youtube look amazing.

but teaching robots to live and work with us autonomously is an equally large data problem. while there are simulation training and other ways to generate training data,it’s unlikely that a robot will “wake up” one day and be highly capable; rather, it will rely on a basic model that controls the entire system.

we still don’t know for sure how complex tasks we can teach robots to perform with ai. i’m just coming to believe that, beyond very narrow, well-defined tasks, it will probably take tens of thousands, if not millions, of robots performing tasks over and over in the real world to collect enough data to train an end-to-end model. in other words, don’t expect robots to be out of our control anytime soon, doing things they weren’t programmed to do.

should they really be like us?

horses are very efficient when walking and running on four legs, but we designed cars with wheels; the human brain is an extremely efficient biological computer, and chip-based computers are far from the performance of our brain. why don't cars have legs, and why aren't computers modeled on our biology?

the goal of building a robot should not be just to imitate.

this is what i learned the other day while sitting in a meeting with the technical leads at everyday robots. we were sitting around a conference table, having a lively discussion about whether our robots should have legs or wheels.

such discussions can easily degenerate into religious debates rather than discussions based on facts or science. some people are very adamant that robots should look like people, and for good reason: we design our living and working environments to accommodate humans, and we have legs, so perhaps robots should have legs too.

about 30 minutes later, vincent duro, the most senior engineering manager in the room, spoke up. “i figured if i could get there, a robot could get there too,” he said simply, sitting in his wheelchair.

the room suddenly became quiet and the argument ended.

in reality, robot legs are very complex both mechanically and electronically. they don’t move very fast, they tend to make the robot unstable, and they’re not very energy efficient compared to wheels.

today, when i see companies trying to build humanoid robots—robots that strive to mimic human form and function—i often wonder if this is a matter of imagination.

there are so many designs to explore that can complement human deficiencies, so why should we cling to imitation?vincent’s words remind us that we should prioritize our attention to the most difficult and impactful problems.at everyday robots, we try to keep the form factor of our robots as simple as possible because the sooner a robot can perform real-world tasks, the sooner we can collect valuable data.

what does it feel like to be surrounded by robots?

i was sitting at my desk when a one-armed robot with a rounded rectangular head rolled over, called my name, and asked if i needed it to clean up. i said yes and stepped aside.

a few minutes later, it picked up a few empty paper cups, a clear starbucks iced tea cup, and the plastic wrapper of a kind energy bar. it placed the items in a trash tray attached to the base, then turned to me, nodded, and moved on to the next desk.

this desk-cleaning service represents an important milestone: it shows that we are making good progress on solving the remaining pieces of the robotics puzzle. the robot is starting to reliably use ai to recognize people and objects!

berndj holsen, a software engineer and former puppetmaster who led the team developing the service, has been an advocate of a hybrid approach, not against end-to-end learning tasks but preferring a "let's make them do something useful now" attitude.

i've gotten used to the robot moving around, doing chores like cleaning my desk, and occasionally i'll see a new visitor or an engineer who's just joined the team, watching the robot go about its business with a look of wonder and delight on their faces.

from their perspective, i realized how novel it all was. as our head of design, reece newman, said (in his welsh accent) as he watched a robot pass by one day, “isn’t it a little weird that this has become the norm?”

everything ends, it's just the beginning

at the end of 2022, the discussion about "end-to-end" and "hybrid approach" is still heated.

peter and his team, along with our colleagues at google brain, have been working to apply reinforcement learning, imitation learning, and the transformer architecture to multiple robotics tasks. they have made significant progress in demonstrating how robots can learn in general, robust, and resilient ways.

meanwhile, the application team, led by beiengji, is combining ai models with traditional programming to prototype and build robotic services that can be deployed in human environments.

at the same time, a multi-robot installation with dancer katie—later dubbed “project starling”—changed my feelings about these machines.

i noticed that people were drawn to these robots and felt surprise, joy, and curiosity. this made me realize that how robots move among us, and the sounds they make, will trigger deep human emotions; this will be one of the key factors in whether we welcome them into our daily lives.

in other words, we are on the cusp of delivering on our biggest bet yet: robots powered by ai that gives them the ability to understand what they hear (spoken and written) and translate it into action, or what they see (camera images) and translate it into scenes and objects they can manipulate.

after more than seven years of work, we have deployed a fleet of robots in multiple google buildings. the same type of robot is performing a range of services: automatically wiping cafeteria tables, checking conference rooms, sorting garbage, and so on.

however, in january 2023, two months after openai released chatgpt, google shut down everyday robots, citing overall cost issues.

although it was indeed costly and time-consuming, it was still a surprise to everyone involved. ultimately, the robot and a small number of employees were transferred to google deepmind to continue the research.

the huge problem we had to solve was a global demographic shift—aging populations, shrinking workforces, labor shortages. and our breakthrough technology—which we knew in 2016—would be artificial intelligence. the radical solution: fully autonomous robots that could help us with the tasks that we constantly perform in our daily lives.

the robot didn’t show up in time to help my mother, who died in early 2021. the conversations i had with her toward the end of her life reinforced my belief that a future version of everyday robots would eventually arrive. and the sooner it arrives, the better.

so the question is: how will this change and future happen? i am both worried and curious about this.