news

From cooking to sewing! Stanford's team creates its own "AI Da Vinci" and works hard to become a surgeon

2024-07-31

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


New Intelligence Report

Editor: Editorial Department

【New Wisdom Introduction】The author of the Stanford Shrimp Scramble Robot has released a new work! Through imitation learning, the Da Vinci robot has learned to perform "surgery" on its own - lifting tissues, picking up needles, suturing and tying knots. Most importantly, all of the above actions are completed autonomously.

The author of Stanford's Shrimp-Frying Robot has released a new work.

This time, the robot is not making fried rice for us, but performing surgery on us!

Recently, researchers from Johns Hopkins and Stanford University conducted a new exploration:

Can the famous medical robot Da Vinci learn surgical operations through imitation learning?

After experimenting, they succeeded!

Da Vinci can perform three basic surgical tasks independently: tissue manipulation, needle handling and knot tying.


First of all, the suturing and knotting techniques require medical students to practice their finger movements. Da Vinci can tie the knots skillfully with his flying needle:


The next step was picking up and transferring the needle, which Da Vinci was also able to do with precision and without any delay.


The third task is to lift the tissue. It can be seen that Da Vinci chose the correct focus and lifted the tissue easily.


The most important thing is that all the above actions were completed independently by Da Vinci!


Sure enough, this level of sophisticated operation has a familiar feel no matter how you look at it.


Paper address: https://arxiv.org/abs/2407.12998

Blog address: https://surgical-robot-transformer.github.io/

It should be noted that compared with desktop operations in home environments, surgical tasks require precise manipulation of deformable objects and face hard perception problems such as inconsistent lighting and occlusion.

Additionally, surgical robots can often suffer from inaccurate proprioception and hysteresis.

How did they overcome these problems?

Large clinical data repositories from which robots can learn

Large-scale imitation learning shows great promise for general-purpose systems that manipulate tasks, such as having robots do our chores.


However, this time, the researchers focused on the field of surgery.

The field of surgery is an untapped field with huge potential, especially with the support of the da Vinci surgical robot.

As of 2021, 6,500 da Vinci systems have been used in 67 countries around the world, performing more than 10 million surgeries.

Furthermore, the procedures of these surgeries were fully recorded, giving us a large repository of demonstration data.

Can such a large amount of data be used to build a general system for autonomous surgery?

However, when the researchers started to study it, they found that there was a difficulty in allowing the Da Vinci robot to perform surgery through imitation learning.

The particularity of the Da Vinci system itself poses unique challenges that hinder the implementation of imitation learning.


The upper right is a real medical environment, and the lower right is the researchers' experimental setup

Moreover, due to the inaccurate joint measurements, its forward kinematics will be inconsistent. Simply training a policy using this approximate kinematic data will usually lead to task failure.

The robot also failed to perform simple visual servoing tasks, and policies trained to output absolute end-effector poses (a common way to train robot policies) had a near-zero success rate in all tasks.


How to overcome this limitation?

The team found that the relative motion of the da Vinci system was more consistent than its absolute forward kinematics.

Therefore, they came up with a solution: introduce a relative action formula and use its approximate kinematic data for strategy training and deployment.

They considered three options: camera-centric, tool-centric, and hybrid-related operations.


The camera-centric motion representation is a baseline approach that models motion as the absolute pose of the end effector relative to the endoscope tip. The other two are relative formulations that define motion relative to the current tool (i.e., end effector) frame or the endoscope tip frame.

The policy is then trained using the image as input and the above action representations.

This is different from previous work, which used kinematic data as input. However, in this work, the kinematic data of Da Vinci may not be reliable.

Their model is based on ACT, a Transformer-based architecture.


The team proposed a strategy design that only takes the graph as input and outputs the relative posture trajectory.

If this approach is successful, large repositories of clinical data containing approximate kinematics could be used directly for robot learning without further calibration.

This is undoubtedly of great significance for the clinical surgical operations of robots.

Sure enough, after introducing the relative motion formula, the team successfully demonstrated imitation learning on Da Vinci using approximate kinematic data. Not only did it not require further kinematic correction, but the results were also much better than the baseline method.

Experiments show that imitation learning can not only effectively learn complex surgical tasks, but also generalize to new scenarios, such as on unseen real human tissue.

In addition, wrist cameras are also very important for learning surgical operation tasks.


Now, in addition to the previously demonstrated autonomous tasks such as tissue manipulation, needle handling and knot tying, the da Vinci robot can also perform the following operations.

Zero-shot generalization

The Stanford team’s model showed the ability to adapt to new scenarios, such as when presented with unknown animal tissues.

Here’s a video of Leonardo da Vinci sewing up pork and tying knots—


If it was chicken, Da Vinci could also accurately pick up the surgical needle placed on the surface of the meat.


This shows promise for expansion into future clinical studies.

Retry Behavior

So, if there are some environmental disturbances, can Da Vinci still perform stably?

It can be seen that after other instruments suddenly broke in and deliberately peeled off the surgical sutures, Da Vinci did not stop and continued to tie the knots.


In the full video below, the da Vinci fails to pick up a surgical needle during its first operation, but it quickly realizes this fact and automatically adjusts to successfully pick it up.


Repeatability test

Clinical surgery is no joke, and the repeatability of clinical robots must be guaranteed. "Absolute safety" is its necessary capability.

The research team released a video of repeated tests of Da Vinci, observing its multiple operations from different perspectives, and it was basically impeccable.




Technical Path

As shown in the figure below, the dVRK system of the da Vinci robot consists of an endoscopic camera manipulator (ECM) and two patient-side manipulators (PSM1 and PSM2) that share the same robot base.

Each arm is a sequential combination of passively set joints followed by motorized active joints.

However, in general, using potentiometers in all joints will result in inaccurate forward kinematics of the arm, with errors of up to 5 cm.


Unfortunately, the forward kinematics data provided by dVRK is not stable. This is because the setup joint (blue) uses only potentiometers for joint measurements, which are not reliable. The active joint (pink) uses both potentiometers and motor encoders for improved accuracy.

In order to enable Da Vinci to complete surgical operations through imitation learning, given the inaccuracy of the robot's forward kinematics, the team proposed the three action representations mentioned above, among which the hybrid relative method further improved the accuracy of the translational action.

Implementation details

To train feasible policies, we use Action Partitioning (ACT) with Transformer and a diffusion strategy.

They used endoscope and wrist camera images as input to train the policy, which were downscaled to 224x224x3 image size.

The original input size of surgical endoscope images is 1024x1280x3 and wrist images is 480x640x3.

Kinematic data are not provided as input as is common in other imitation learning methods, this is because kinematic data are usually inconsistent due to the design limitations of dVRK.

The policy outputs include the end-effector (delta) position, (delta) orientation, and the dual-arm jaw angle.

experiment procedure

In this experiment, the researchers aimed to find out the answers to these questions:

1. Is imitation learning sufficient for complex surgical tasks? 2. Is the relative motion of the dVRK more stable than its absolute forward kinematics? 3. Is the use of a wrist-mounted camera critical to improving success rates? 4. Can the model generalize effectively to new, unseen scenarios?

The first thing to assess is whether the relative motion of the da Vinci is more consistent than its absolute forward kinematics.

The evaluation method consists of repeatedly recording reference trajectories using absolute and relative motion formulas in different robot configurations.

Specifically, the robot needed to use the same holes in a dome that simulated the human abdomen, with the arm and endoscope placed in roughly similar positions.

This task is not trivial, as the hole is much larger than the dimensions of the endoscope and tool shaft, and the tool must be manually placed into the hole by moving the mounting adapter.

Overall, the experiments show that relative motion is more consistent in the presence of measurement errors. Therefore, modeling policy actions as relative motion is a better choice.


In this configuration, a total of 224 tissue lifting experiments, 250 needle picking and transfer experiments, and 500 knot tying experiments were collected.

Figure 5 shows reference trajectories recorded repeatedly for various robot configurations to test the repeatability of all action representations.

The left figure shows a perfect reconstruction of the reference trajectory for all action representations, since the robot joints have not moved since the reference trajectory was acquired.

When the robot moves left or right (center and right), the camera-centric motion representation cannot track the reference trajectory, while the relative motion representation can track the reference trajectory well.


Trajectory tracking for various robot configurations

In addition, the team evaluated the task success rate of models trained using various action representations.

The results show that policies trained using relative action representations (tool-centric and hybrid relative action representations) perform well, while policies trained using absolute forward kinematics fail.

In the picture below, the top row shows the tissue lifting task, where the robot needs to grab a corner of the rubber pad (tissue) and lift it up.

During training, one corner of the tissue remained within the red box, showing the configuration of the corner during testing.

The middle row is the pick up and hand over of the needle.

During training, the needle was randomly placed inside the red box. During testing, the center bump of the needle was placed in the 9 positions shown in the figure to enforce a consistent setting during evaluation.

In the bottom row, the robot needs to use the rope on the left to form a loop, grab the end of the rope through the loop, and then pull the clamps away from each other during the knot tying process.

During training, the position of the rope from the mat was randomly placed inside the red box, while during testing, the rope was placed in the center of the red box.


The video below shows the results of training a strategy using absolute forward kinematics (camera-centered movements) of the arm.

These policies fail to accomplish the task due to errors in the forward kinematics of the da Vinci arm, which change significantly between training and inference.




Furthermore, the researchers observed that the wrist camera provided significant performance improvements when learning surgical manipulation tasks.


Clearly, surgical robots capable of autonomous learning are expected to further expand surgeons’ capabilities in the future.

References:

https://surgical-robot-transformer.github.io/