news

Post-00s CEO Yang Fengyu: Returning to China to start a business, he built the first "mass-producible" humanoid robot in five months

2024-08-05

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Humanity is experiencing an explosive growth in the field of artificial intelligence, and almost every step in the expansion of technology into the unknown has attracted an astonishing amount of attention.

As the boundaries of artificial intelligence expand, innovations and divergences in the technical routes of important tracks coexist. The judgments and choices of technology pioneers influence the footsteps of many followers.

In the past year, Synced exclusively introduced outstanding companies such as Darkside of the Moon, Biodata, Aishi Technology, and Wuwen Core to everyone, leaving them with the first "10,000-word interview transcript" in the Internet world. At a stage when the technology route has not yet converged, we have seen the leading power of AI entrepreneurs who truly have faith, courage, and systematic cognition.

Therefore, we launched the "AI Pioneers" column, hoping to continue to find and record entrepreneurs with leadership qualities in various segments of artificial intelligence in the AGI era, introduce the most outstanding and high-potential startups in the AI ​​track, and share their most cutting-edge and distinctive insights in the field of AI.

Author: Jiang Jingling

Machine Heart Report

Even though young academic geniuses have become one of the mainstream backgrounds of AGI company founders today, Yang Fengyu, who was born in 2000, is still surprisingly young.

Yang Fengyu, a 23-year-old who majored in computer science at the University of Michigan and received a doctorate in computer science at Yale University, started his own embodied intelligent robot business last year.



In 2024, UniX AI, an embodied intelligence company founded by him, completed the research and development and manufacturing of a wheeled humanoid robot in five months. This robot, which has functions such as "post-meal cleaning" and "laundry", will start mass production in September and will be sold to the public.



This is a very fast speed for commercialization when many embodied intelligent robots are still in the laboratory stage.In Suzhou, UniX AI’s robot mass production factory has an area of ​​more than 2,500 square meters.



This company, which almost no one had heard of last year, has recruited many senior technical talents in the robotics industry in the past six months. "There is a R&D director of a leading service robot who helps us make the chassis, and some top talents from leading humanoid robot companies are responsible for our hardware."In July 2024, Professor Wang Hesheng, a famous robotics expert at Shanghai Jiao Tong University, announced that he had officially joined UniX AI as chief scientist.

In the first technology demonstration video released by UniX AI, a wheeled humanoid robot named Wanda can complete tasks such as picking up tofu, helping to sort clothes, and taking clothes to the washing machine for washing. UniX AI seems to have found a solution to the "flexible task" problem that is difficult for current embodied intelligence companies to solve.

"I don't think there is anything wrong with being young. From a technical perspective, many of the current new technologies and new products are created by young people with a strong academic background." What surprised us was that as a post-00s, Yang Fengyu showed a maturity beyond his age in his conversation and had a very clear understanding of the technical stages of company management and embodied intelligence.

Our curiosity about UniX AI is focused on how a company with almost no news in the venture capital circle can achieve such a fast development speed; as one of the few companies with embodied intelligence founded by post-00s, how did UniX AI achieve development from 0 to 1? What is UniX AI's final roadmap for embodied intelligence?

With these questions, Synced had its first public media conversation with Yang Fengyu since he started his business.

Yale Post-00s

Engage in embodied intelligence entrepreneurship

Synced: Have you graduated now?

Yang Fengyu: I went directly to Yale for my doctoral program after my undergraduate studies, and I basically met all the requirements for a doctoral thesis. Take this year for example, I had 4 papers published in CVPR, and together with others, I have a total of more than ten papers published in top conferences on artificial intelligence and robotics.

Synced: You have a lot of energy.

Yang Fengyu: (laughing), I often stay up until 3:30 in the morning, and I even had to get an IV drip some time ago. Mainly because we are together as a team, I often don’t look at my watch, and when I look up, it’s already very late.

Synced: When did you first start thinking about starting a business?

Yang Fengyu: I have always believed that starting a business requires “the right time, the right place, and the right people.”

Last year, we saw great progress in the perception technology, including some large models or base models of multimodality such as vision, language models and touch, which made us see the possibility of achieving our goals. In addition, the country has also introduced a series of supporting policies to provide a good environment for entrepreneurship.This is the "right time".

"Geographical advantage":There is no doubt that general humanoid robots are the next development direction after new energy vehicles. China has incomparable advantages in the supply chain, and there are also many high-tech talents in the Yangtze River Delta.



At the beginning, we did some research to find out what stage the engineering level of the robotics industry has reached, where the market demand is, what problems the previous generation of robots solved, and where its future opportunities are?

The key to success is to find the right people. This year, we officially formed a team and quickly assembled experts from many fields, including the R&D director of the head sweeping robot and some top talents from the head humanoid robot company to be responsible for our hardware. At the algorithm level, I recruited a group of talents in the United States and Europe, including some of my classmates and seniors.This is "harmony among people".

As a founder and CEO, the most important thing is to pool resources.UniX AI is a global company that combines the strengths of different countries in robotics software, hardware, and supply chain;At the same time, we have international plans and will realize the company vision of Robots For All through continuous efforts in one-year, three-year and five-year plans.

Synced: Could you please briefly introduce your academic experience?

Yang Fengyu: I went to primary school and high school in China, and studied computer science at the University of Michigan. I first came into contact with vision and machine learning, and later, under the influence of my mentor's "multimodal learning", I began to study vision and touch.

During my undergraduate studies, I published 5 papers on robot vision and touch.Among them, "Touch and Go: Learning from Human-Collected Vision and Touch" is the world's largest visual and tactile sensory dataset.Accepted by NuerIPS, a top conference in the field of artificial intelligence and machine learning.

In another work, we introduced the diffusion model for the first time to complete the mutual transformation between vision and touch, and the results were accepted by ICCV.

For robots, touch is very important. It is difficult to tell with the naked eye whether a piece of clothing is made of polyester, cotton or silk. Only by touching it can we distinguish the different textures. In addition, in some delicate activities, such as plugging a charging cable into a charging port, constant adjustments are also required through touch, which cannot be done by vision alone.

Machine Heart: Then you came to Yale.

Yang Fengyu: Because of my work on robot vision and touch, especially the transformation of vision and touch and its generalization in large language models, I was awarded the title of Outstanding Undergraduate Scientist by the Association for Computing Machinery, the first person in the history of the school. I eventually chose Yale University for my doctoral studies.

During this period, I published a number of papers, including "Binding touch to everything: Learning unified multimodal tactile representations" (CVPR, 2024, pp.26340-26353). In this paper, I proposed the world's first large tactile model UniTouch suitable for multiple different tactile sensors, which is suitable for vision-based tactile sensors connected to multiple modalities such as vision, language and sound.

Another paper"Tactile-Augmented Radiance Fields" (CVPR, 2024, pp.26529-26539) established the world's first 3D visual-tactile model TARF that can be generalized at the scene level. The generalization capability of the UniX AI humanoid robot is also based on this model.



Synced: Do you think being a post-00s generation is more advantageous or more disadvantageous to you?

Yang Fengyu: For a startup company, the founder is its soul.Many people think I am very young, but I don’t think my identity as a post-00s generation is a problem.

From a technical perspective, young people play a very strong role in promoting this round of technological change and track innovation.Many new technologies and products are created by young people today, especially in the high-tech industry, where the entry barriers are relatively high. One of the members of Sora’s core team is also my classmate, who demonstrated strong technical capabilities when he was at the University of Michigan.

From the perspective of cognition and experience, I think learning quickly and correcting mistakes quickly is also a path. Another thing is personality. You must be willing to persist and be resilient, try every possible way, and have the spirit of "cutting a path through mountains and building bridges over rivers". After all, entrepreneurship is all about results.

Of course, there are also many experienced experts in the UniX AI team. They have rich experience in structure, electronics, etc. Only through effective cooperation between us can we launch our products in a short time.

Visual touch + operation

Improving the generalization ability of robots

Synced: Why is improving the sense of touch important for robots?

Yang Fengyu: Humans are multi-sensory animals. Your action decisions are usually the result of the combined influence of information transmitted by multiple senses. The same is theoretically true for intelligent robots.

Touch is one of the most important sensory information. Compared with visual feedback, it is generated after the robot interacts with the environment, while visual feedback is generated before.When a robot grasps an object, the object deforms. Essentially, the incremental information the robot obtains after this interaction comes from touch—what it feels like.

Having tactile information can enable robots to perform better in more complex and delicate tasks, greatly improving the success rate of grasping tasks.Especially in the grasping of flexible objects, the role of touch is more obvious, which can be said to be a qualitative improvement from being basically impossible to complete the task to being able to complete the task.

For example, our wheeled humanoid robot Wanda has already achieved tasks such as pinching eggs, grabbing tofu, and washing clothes. These tasks rely purely on vision, and the robot cannot get any feedback, so it is difficult to execute them.



Why do robots now mainly rely on vision to make judgments? This is because visual data is the most direct, easy to obtain and trained compared to other data, and there is a large amount of data available. However, when robots move further towards embodiment, relying solely on vision is definitely not enough.

As a type of sensory information that relies on interaction, the significance of being able to use tactile information reasonably also lies in the fact that robots can gradually learn through real interactions with the world and become more usable and generalized.

Synced: Why does the addition of touch improve the robot’s control over flexible objects? What’s the principle?

Yang Fengyu: The principle is that there are big differences between flexible objects and rigid objects when grasping and operating. The physical shape of a rigid object basically does not change before and after being touched, so it is relatively easy to judge when grasping through visual observation. However, before grasping or operating a flexible object, it is difficult to determine what will happen after contact with it through observation, because there will be a lot of occlusion and deformation during the grasping process, and these deformations are difficult to accurately predict through vision.

For example, when you hold a tissue, once you hold it in your hand, it will completely block your vision, and vision can hardly provide effective information to determine how to grab or operate it. In this case, you can only rely on physical information such as touch to complete perception.

Synced: Why is it that most of the time I don’t even need to try to grab an object, I just know how to do it?

Yang Fengyu: That’s because as a human being, you have been integrated so well that you don’t even know that you are using tactile information. You have accumulated tactile data for more than 20 years, so you don’t know which sense supports you to complete this task.

Synced: For most robotic tasks, what are the differences in the contribution ratios of different senses? At this stage, how high is the priority of touch?

Yang Fengyu: For most robot tasks, different senses contribute different proportions in the three steps of perception, reasoning and decision-making, and action.

At the perception level, in the early stage, we mainly rely on vision and point clouds to obtain global information.For example, knowing the layout of the entire home, where the water is, etc. At present, the problem of perceiving global information through visual large models and 3D large models has been basically solved.

At the decision-making level, language is mainly used to introduce human prior knowledge.For example, after receiving the instruction to get water from the refrigerator, the robot can break down the task and know that the first step is to open the refrigerator, the second step is to get water, and the third step is to close the refrigerator. This prior knowledge comes from a large amount of Internet data.

At the action level, vision can help the robot determine where to grasp.However, tactile information plays an important role in determining the grasping force. For example, in the case of occlusion, such as when holding tofu, it is difficult for vision to accurately judge the grasping method, while touch can provide key information to help the robot complete accurate grasping.

In addition, touch plays an important role in some scenarios that require fine force control, such as pinching eggs, grabbing tofu, etc., as well as in some scenarios that require judging the deformation of objects and force feedback.



In general, the contribution of different senses varies from task to task.In the grasping of some rigid objects, vision may account for a relatively high proportion; while in the grasping of many flexible objects, the role of touch is more critical, and it can even be said to be a qualitative improvement from basically being unable to complete the task to being able to complete the task.

Synced: Are the barriers to entry in the field of touch high enough? What are the difficulties in implementing it in robotic products?

Yang Fengyu: I think it is relatively high. Before 2023, touch has always been a very niche modality. Compared with vision and hearing, there are very few people engaged in touch-related work.

In the early days of tactile work, sensors were the biggest challenge. At that time, there were not many people working in data-related fields around the world, and how to make sensors was a key issue.

Secondly, there is the issue of how to parse tactile information, which involves two levels: algorithm and data.In terms of data, most of the specific data on tactile sensing in the world has not been made public.This may be due to the special nature of many robot combinations or other reasons, making the data in the field of robotics less open than in the field of vision. Therefore, we continue to solve the problem of datasets and are committed to promoting the continuous disclosure of tactile sensing datasets around the world.

At the algorithmic level, touch is different from vision and involves a lot of physical prior knowledge.For example, the force conditions can be determined by the markers on the sensor, but this information is not as easy to interpret and identify as visual information.

At that time, an experiment was conducted, and the results showed that the generated tactile signals were very difficult for people to distinguish. Because if people have not undergone some specific training, it is difficult to distinguish the tactile sensor signals of each object. We are also actively working to lower this barrier and encourage more people in the academic community to participate in it, so as to promote the development and progress of the entire tactile field.



Synced: If tactile information not only faces the problem of small amount of existing data but also high cost for large-scale collection, how can we scale it up?

Yang Fengyu: Our previous work was actually to try to solve this problem, how to scale up when large-scale data collection is difficult to achieve:

The first step is to connect vision and touch, and predict touch through vision.Even in scenarios without tactile acquisition, information such as vision and language is used to infer tactile signals.

For example, after collecting tactile information of tables of the same type and material, when we go to a new home or office scene, we can infer its tactile signals through visual and language information even if we have not actually touched the new table. In this way, we can expand the available data set even without real physical contact. However, this method may be somewhat different from the real signal because it is predicted.

Second, we continue to promote the public disclosure of tactile datasets.By making the dataset public, more people can participate in the research and development of the tactile field, thus promoting progress in the entire field.

Third, at the algorithm level, we strive to lower the threshold for tactile information recognition.For example, by adding markers to the sensor and discovering how the markers change when subjected to different forces, we can use these physical prior knowledge to better analyze tactile information.

Fourth, we are committed to combining different information, such as vision, touch, language and other multimodal information, to complete various tasks.By fusing multimodal information, we can make up for the lack of tactile data to a certain extent and improve the generalization ability and adaptability of the model.



Synced: Is large-scale data collection possible? What conditions are required?

Yang Fengyu: I think this is actually the bottleneck of the entire development of embodied intelligence. I personally believe that large-scale collection can be achieved, but there is a commercialization process here.

When robots enter thousands of households and there are a certain number of them, you will be able to collect enough data to support more scenarios for generalization. Of course, you can't always collect every point, so the proposition of "large scale" will always exist. The essence of machine learning is to achieve a simulation fitting and prediction of dense distribution through sparse sampling.

In terms of data, we do not reject simulation, but I think a certain amount of real machine data is a necessary condition for realizing embodied intelligence.

Synced: What are the key technical indicators of the tactile big model?

Yang Fengyu: Like any other large model, the tactile model has some indicators in different downstream tasks. The Touch and Go, the world's largest visual and tactile dataset built by my team, is one of the important general benchmarks for robot visual and tactile pre-training models around the world.

Wanda, the embodied intelligent robot

Mass production started in September

Synced: After deciding to start your own business, what kind of embodied intelligence company do you plan to build?

Yang Fengyu: The essence of entrepreneurship is to create value for society.UniX AI is one of the few embodied intelligent robot companies in the world that has made the C-end its first strategy.

Although there is still a long way to go for TO C, the potential behind it is huge. From an industrial perspective, humanoid robots have entered the technical integration period of hardware + AI, and are developing rapidly and becoming more and more practical. And I am optimistic that this integration process will be much faster than originally estimated by industry insiders.

Population aging, low birth rate, labor shortage... These are all problems facing the world. The responsibility of enterprises is to solve problems for society. This is the opportunity and value point of UniX AI, and also the original intention of my entrepreneurship. Now the general landing path of this track is basically industry-commerce-family. We will cover business and family, which is also the main scenario for serving TO C users.

UniX AI's vision is Robots For All, to create universal humanoid robots that lead in both athletic ability and intelligence, and enable physical labor and intelligent companionship.

Synced: Why did you choose to work on a home environment in the first place?

Yang Fengyu: In fact, we are not limited to home scenarios, we also work in general commercial scenarios, such as offices.

Technically, the To B scenario is relatively less difficult, with a high repetition rate and less demanding requirements for generalization. However, the To B scenario often requires strong substitution logic, which places very high demands on the robot's speed and operating accuracy.

Home scenes are very complex and ever-changing. Each home is a small ecosystem, which requires robots to have strong generalization capabilities. This of course places higher demands on our products. At the same time, we will also have many L2-level functions in home scenes, which will further improve the adaptability and playability of the product in complex scenes.

In general, our technology stack can cover both To B and To C. If we can handle the home scenario well, I think other scenarios will be easy to handle. Starting with the most difficult bone not only reflects UniX AI's technical strength, but also is our strategic path to enter the market.

Synced: Will you also do To B scenarios like factories?

Yang Fengyu: We are not exclusive to all scenarios. UniX AI's modular hardware solution can be relatively adaptable to many different scenarios. At the same time, we have a set of motion primitive algorithms that decouple perception and operation to maximize the use of data, and our migration to scenarios will be very strong. Although each product has its boundaries, we are willing to try and expand in various scenarios. We are also running through some important business scenarios to help consumers.



Synced: What is the so-called supply chain cost advantage?

Yang Fengyu: We have a group of experienced supply chain management experts in our team. They have mastered mass production-level cost control methods and can apply them to the robot supply chain. Although the robot industry has not yet launched a large-scale price increase, we have controlled costs at the mass production level from the beginning to ensure that the products can reach a price acceptable to consumers. We are confident that through effective cost control, our products will be highly competitive in price and provide strong support for the company's development.

Synced: What’s the price range for the upcoming products?

Yang Fengyu: It is not convenient for me to disclose this now, but I can guarantee thatIt must be a surprising price.

Synced: How do you plan to reach the end?

Yang Fengyu: The logic behind our path to the end is very simple.A certain amount of high-quality real data is needed. The key lies in how to obtain this dataFor example, taking autonomous driving, Tesla's FSD was able to reach the end because it took 6 to 8 years of cars running on the road and collecting data.

The robotics industry is different. People expect robots to do things automatically. We first developed several single-point scenario functions.Make everyone feel that robots are useful or fun, and within their affordability, so that people will be willing to buy them.

Our supply chain has advantages and can bring prices down, which is a very critical point.Through continuous feedback from users, we continue to optimize and iterate our products, and ultimately create a universal embodied intelligent robot.



Synced: What is the difficulty and significance of mass production of robots?

Yang Fengyu: It is easy to make a DEMO. As long as one unit is made in the laboratory, it is a success. The difficulty of mass production is that it is not just one unit, but one hundred or one thousand units that actually enter the user's home. It tests the data security, operational stability, and underlying control reliability of the product. It requires a strong after-sales team and a constantly iterating technical team. In addition, the process is also very important, which is also an important indicator to test the mass production capability.

Its significance is beyond doubt. On the one hand, it reflects the competitiveness of the supply chain, and on the other hand, it demonstrates the maturity of the technology. Who is the first to try it? Who eats it quickly and well? In addition, mass production can gain certain first-mover advantages.

Synced: After you decided to start a business, what are your initial team building ideas and team formation status?

Yang Fengyu: From 0 to 1, the startup team is very important. I am used to having a top-level plan first, and then slowly deploying it to each level, like a waterfall, from top to bottom. First find the most core key people, and after getting started, extend downwards to continuously improve the team and get the whole wheel moving.

From the end of last year to now, our team has developed very fast and has iterated three generations of products. The team size has taken shape, but we will continue to adjust and improve according to needs in the future to make the company more and more competitive.

Acquiring talent is one of the most important things for a startup company. I have personally met most of the talents in our company.Many times, the CEO is not only the chief executive officer, but also the "chief meaning officer" who needs to explain to his peers what we do and its value and significance.It is very important to get them to agree and go on the journey together.

At the same time, my management radius was very large and the management granularity was very fine during this stage, which was very hard but necessary. Only when I had a full grasp of the situation and confirmed that the company was heading in the right and stable direction, could I spend more time on other aspects.

Synced: How do you attract these talents?

Yang Fengyu: What attracts everyone essentially is the path to the final outcome of embodied intelligence, and the question of how to do it is also a question.

We have several highlights. First, we have a very strong supply chain cost advantage. Second, our team has strong execution and very fast iteration speed. Many candidates may think we are just so-so when they first get to know us, but when they come back a few weeks later, they find that the scenarios have been run through and progress is very fast. We also have some talents from top domestic robotics companies who have actively asked to join us.

Synced: What is the current source of funding?

Yang Fengyu: We will disclose it at the appropriate time.

Synced: Are there any plans for external financing?

Yang Fengyu: Currently, investor feedback is very enthusiastic. We welcome investors who share the same vision of universal embodied intelligence as us to work with us in the long term.

Synced: Could you please give us more details about your upcoming products and future marketing plans?

Yang Fengyu: The robot we are about to mass produce is called Wanda, a wheeled humanoid dual-arm robot. In the first technical video we released, you can see some of its features, but that's not all. When we sell it to consumers in September, there will be more surprise details.

Ultimately, UniX AI hopes to deliver to consumers a universal embodied intelligent robot that not only serves the family, but can also accompany people to more distant places and provide more functions. This requires us to continue to develop technology, and also requires collaboration and co-creation between the company and users. A journey of a thousand miles begins with a single step, so let's start with the first step.