news

Embodied intelligence is being applied in the industrial field

2024-08-07

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


Produced by | Huxiu Think Tank

Author: Huang Siyu

Header image | Visual China

At a time when industries are transforming and new productivity is emerging, embodied intelligence has entered factories with a series of cutting-edge technologies, such as large models, multimodal large models, and visual/tactile small models, and has embarked on a practical journey in the field of industrial manufacturing. However, complex and diverse industrial scenarios have brought severe challenges to embodied intelligence.Which scenario should we start with? How can we switch between different scenarios smoothly? How can we obtain the extensive and high-quality data required by its intelligent system?

Therefore, at 7 pm on July 30, 2024, Huxiu Think Tank 502 online colleagues invitedYin Zhi, Chief Advisor of Shanghai Artificial Intelligence Technology Association, Fu Yihui, Director of Tashan Technology Ecosystem, and Dr. Zhao He, CTO of Weiyi Intelligent Manufacturing, jointly discussed the innovative applications of big models in the field of embodied intelligence, and shared practical application cases of embodied intelligence and humanoid robots.

1. Integration of embodied intelligence and big models and technical challenges

The best carrier of embodied intelligence is humanoid robots, and their growth depends on large models

The combination of big models and embodied intelligence has achieved certain results and has broad prospects. Yin Zhi, chief consultant of the Shanghai Artificial Intelligence Technology Association, said that big models can serve as the brain of robots, have planning and reasoning capabilities, can decompose goals into sub-goals and call related functions, and although their current role is limited, they have great potential.

Dr. Zhao He, CTO of Weiyi Intelligent Manufacturing, believes that in industrial scenarios, non-humanoid robots may be more suitable due to the nature of their work; in life scenarios, they may be a better form.

The big model has revolutionized the way humans and machines interact, allowing humans and machines to communicate in natural ways such as text, voice, and pictures. The machine can understand human intentions and perform operations, which is a major breakthrough. The big model provides efficient interaction, allowing robots to learn new tasks and obtain specific requirements and operating specifications.

Fu Yihui, Ecological Director of Heshan Technology, agrees that big models can empower embodied intelligence. He said that humanoid robots are the best carriers of embodied intelligence, and their growth depends on the development of big models. Creating humanoid robots that think and move like humans not only relies on big models, but also involves deep learning, motion control algorithms, and overall perception, such as tactile perception, visual perception, and the ability to understand complex environments and logical reasoning.

Spatial intelligence, which has been mentioned recently, is based on letting robots see the world, so that they can better understand the world, learn knowledge and take actions in the world they observe. Under this premise, achieving embodied perception real2sim (the process of applying a model trained in a simulation environment to a real environment) and letting robots better perceive the world is the prerequisite for making a more flexible robot.

Embedding large models into embodied agents requires balancing energy consumption and benefits

An audience member asked, “Does embedding large models into intelligent robots require additional hardware support? Is it necessary to rebalance performance and energy consumption?

Dr. Zhao He thinks this is a good question. When embedding large models into intelligent robots, it is necessary to consider hardware support and the balance between performance and energy consumption. In a broad sense, embodied intelligent entities, including humans, need to balance energy consumption and benefits. For example, the human brain consumes little power but is smart. However, the current large models consume little computing power and energy during training and reasoning, which cannot meet the needs of intelligent robots.

The future development direction is to significantly reduce the consumption of computing power and energy by large models through technology, to improve the model architecture and paradigm, and to develop dedicated chips, such as ASIC, to improve the intelligence density, that is, the degree of intelligence per unit area of ​​the chip, through software hardening, and use the developed dedicated chips for model reasoning, etc., so as to effectively apply large model technology to embodied intelligent robots.

Multimodal large models will be an important module for embodied intelligence to achieve technological breakthroughs such as control and decision-making

In terms of key technologies such as embodied intelligence control, decision-making, and navigation, Yin Zhi believes that the current problem of robots driven by large language models is that they need to convert information into text for processing. This is only a temporary transition. In the future, robots should control and make decisions on the native perception environment, and navigation will not require language. There will be a lot of logic in vision.

In the future, large models may improve intelligence capabilities in a multimodal unified form, just like the human brain has modules responsible for different functions. At present, multiple small models may control different functions respectively, and large models commanding robots is a transitional form. Multimodal large models are the trend, but it will take time. It may be possible to improve intelligence capabilities by combining small models with large models. This depends on the development of large models. At present, there is still a lack of multimodal native large model data.

Zhao He pointed out that in terms of control, the introduction of visual servoing in industrial robots can connect the bottom-level control with the upper-level tasks, greatly reducing the application cost; in terms of navigation, the premise is perception, including multimodal information such as touch and temperature, but the effective integration of multimodal information is technically challenging, but it is a positive development direction. Through multimodal multi-person dialogue, the machine can understand the human's intentions and task requirements and complete flexible tangents. This will be a revolutionary progress, and it can also provide basic intelligent standards for more intelligent industrial robots, and realize standardized production and flexible on-site adaptation.

Fu Yihui agreed with Dr. Zhao He's point of view. He mentioned that when providing tactile sensors for humanoid robots, there are problems such as the fusion of tactile data and visual data, as well as high-precision perception, complex decision-making, and robustness. Among them, the fusion of tactile data and multimodal perception data is the key to breaking through embodied perception. Multimodal large models with tactile perception data help robots interact in complex scenes. Although traditional algorithms can achieve the implementation of some scenes after a lot of training, there are still scenes that have not been trained, and they are difficult to cover with low probability scenes and have low generalization ability.

2. Application scenarios and value of embodied intelligence in the industrial field

Industrial manufacturing application scenarios start with solving flexible tangent problems

Industrial robots have been used for a long time, but they are mostly automated, with problems such as complex debugging, high cost, and low efficiency. The core of intelligence is to solve the problem of flexible tangent. Zhao He shared several specific industrial application cases. For example, in the assembly line, he helped customers solve the problems of high tangent costs and difficulty in adjusting production lines. Originally, it took three engineers nearly a week to adjust the production line to make the entire production line run at full capacity, and the use of intelligent industrial robot technology is expected to improve this situation.

In the quality inspection process, we provided a large factory with more than 3,000 employees with appearance quality inspection machines based on AI technology, successfully replacing more than 2,000 quality inspection workers. This not only reduced labor costs, but the machine performance was not inferior to manual labor and could work 24 hours a day.

In the post-processing stage after defect detection, in order to address the efficiency and quality issues caused by workers' blind typing in the die-casting repair process, intelligent industrial robot technology is used to coordinate multiple robots to form an intelligent workstation through a series of operations such as defect detection, trajectory planning, and polishing. This solves the limitations of manual operation and has a positive impact on the process flow of large die-castings.

The powerful flexible cutting ability of embodied intelligence will improve industrial manufacturing production efficiency

The improvement of enterprise efficiency is reflected in two aspects: labor costs and production and operation efficiency. Fu Yihui pointed out that from the perspective of labor costs, when the price of robots can better cover labor costs, it is the node of commercialization, and the humanoid robot industry may be larger than the automotive industry market. In terms of production efficiency, robots can work 24 hours a day without interruption, and their learning efficiency is far higher than that of humans. Through optimized decision-making and precise control, they can improve the efficiency of factory or business operations.

Zhao He believes that the common pain point of the manufacturing industry is the problem of people, such as difficulty in recruiting and difficulty in retaining skilled workers. Enterprises hope to reduce their reliance on labor and reduce costs. Weiyi Intelligent Manufacturing has made appearance quality inspection machines based on AI technology for a large factory with more than 3,000 employees, replacing 2,000 quality inspection workers and reducing labor costs. The performance of the machine is no worse than that of humans and can operate 24 hours a day.

Second, the cost of flexible line cutting is high, the production and supply and marketing models have changed, and small batch and multi-batch orders have increased. The traditional production model cannot meet market demand. Intelligent production equipment such as intelligent industrial robots are needed to have fast and low-cost flexible line cutting capabilities to cope with new pain points in the manufacturing industry and empower enterprises to improve production efficiency.

Prioritize dangerous scenarios and enter more complex scenarios as generalization capabilities increase

There are different opinions on the priority scenarios for embodied intelligent robots. Yin Zhi believes that embodied intelligent devices have been widely used in the manufacturing industry, such as robotic arms and logistics robots. In the future, they will be more used in the assembly, logistics and warehousing links of the manufacturing industry. Service robots will also become increasingly common, including homes, shopping malls, communities and other scenarios. Self-driving cars are also a category. He thinks that we should start with jobs that humans are unwilling to do, unsuitable for, or dangerous and boring.

Fu Yihui believes that it cannot be simply understood in the order of industry, commerce, and family. The first to be implemented should be those with relatively simple scenarios and capability requirements. For example, the installation and wiring of in-vehicle wiring harnesses in automobile production lines still rely on manual labor, which requires the fusion of humanoid robot touch and multimodal perception to solve; in commercial scenarios such as pharmacies replacing clerks to dispense medicine, supermarkets exchanging and replenishing goods, gas stations refueling and charging, etc., there are also dangerous or special application scenarios, which can replace people engaged in dangerous work. Ultimately, as the generalization capabilities of humanoid robots increase, they will enter more complex family collaborative interaction scenarios.

With fast acquisition and learning and improved generalization capabilities, many complex scenario problems can be solved

Regarding how to improve the generalization and universality of various skills to realize application in different scenarios, Fu Yihui started from the perspective of tactile perception, collected tactile-related data through real scenes and promoted simulation training to improve the generalization ability of robot's dexterous operation and grasping. In industrial scenarios, faced with complex and diverse objects to be grasped, it is necessary to improve the generalization of tactile or grasping ability to solve the problem.

Zhao He believes that there is currently no way to allow robots to do everything without further learning. In industrial scenarios, generalization and universal capabilities are reflected in basic intelligence, that is, through technologies such as large models, robots are able to quickly learn new task skills. If this can be achieved, it will be a revolutionary advancement in the application of intelligent industrial robots in industry.

Fu Yihui believes that humanoid robots need to be able to think in chains and have a certain degree of generalization to meet the needs of robots in various scenarios. For example, Google RT2 integrates LLM and visual Transformer to achieve sensor-control integration, and makes the best action strategy based on the judgment of the environment and intention to improve the robot's execution ability.

3. Exploration of Data Collection and Training Methods for Embodied Intelligence

An audience member asked about how to obtain a large amount of data. Are there any innovative ways to obtain it? Zhao He believes that the development of the industrial Internet has objectively accumulated data for the birth of industrial big models. In actual application, data collection, organization, automation, and intelligent operation should be implemented in products and services as the core point.

Currently, there is no mature and widely accepted training method for embodied intelligence. Taking intelligent industrial robots as an example, it is expected that large models can understand tasks and generate control instructions through artificial video vision and other methods.

In terms of training methods for embodied intelligence, it is more about training embodied intelligent bodies in a simulation environment, or using simulated or generated data for training. This is because using actual data to train embodied intelligent models or intelligent robots has problems with data collection difficulties and insufficient data volume.

Yin Zhi suggested that third-party professional data service providers, such as data annotation companies, may evolve into AI trainers or service providers in the future. In China, the labor cost advantage is more obvious, and it is expected to form an industry based on multimodal data and intelligent training services.

Words

Although the integration of embodied intelligence and large models has achieved certain results, how to further optimize the balance between energy consumption and benefits and realize the mature application of multimodal large models still requires continuous research. From solving the key flexible tangent problem to improving generalization capabilities to adapt to complex and diverse industrial environments, establishing mature and effective training methods, or making full use of the advantages of third-party data services, all are key to promoting the widespread application of embodied intelligence in the industrial field.

Throughout the event, online attendees actively participated in the interactive exchanges, including people from Audi China, NIO, Ideal, Dongfeng, Schneider Electric, Amazon Web Services, Horizon Robotics, Huawei Terminal, Baidu, China Telecom, and other companies, as well as people from institutions such as CICC Capital, Dingjie Software Investment, Huaruan Holdings, Yizhuang State Investment, and China Unicom Industrial Internet of Things. The attendees and guests had in-depth dialogues, exchanged practical experiences, and discussed business cooperation, which successfully concluded the 502 online peer discussion event.

Track more digitalization and AI innovation practice activities, please follow usTiger Sniff Think TankSign up to receive our content updates and event notifications.

About Huxiu Think Tank: Huxiu Think Tank is a new research service organization focusing on enterprise digitalization and AI innovation practices. It provides insightful research reports, case selection, online meetings, offline activities and visiting services to both parties in the process of industrial intelligence, in order to support corporate executives in making wise decisions in intelligence and digitalization. The core value we provide: Timely and high-quality insights to understand technology, industry, peers and competitors; Provide important references for decision makers in technology and product strategic decisions, industry planning, and solution selection; Help the market fully understand the development status of cutting-edge technologies and the industries they affect, as well as future trends