news

Robots sew wounds and tie knots without shaking hands, created by Hopkins and Stanford ALOHA authors

2024-07-18

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Machine Heart Report

Synced Editorial Department

The robot can perform surgical operations very well.

Has robot surgery developed to this level? It looks very skillful and the "hands" are very steady.

The robot's two "hands" flexibly shuttled through the pork to sew and knot it:



The strength is just right, and the "wound" is sutured perfectly:

The robot can also accurately pick up sewing needles placed on various objects without pinching other things:



Accurately lift tissue for subsequent observation and manipulation.

The above research is from the da Vinci robot jointly developed by Johns Hopkins University and Stanford University.



Paper address: https://surgical-robot-transformer.github.io/resources/surgical_robot_transformer.pdf

Project homepage: https://surgical-robot-transformer.github.io/

Team members include Ji Woong (Brian) Kim, a postdoctoral fellow at Johns Hopkins who worked on the research with Axel Krieger and Chelsea Finn; Tony Z. Zhao, a Stanford doctoral student who worked on the Mobile ALOHA and ALOHA 2 household robots and whose mentor is Chelsea Finn; Samuel Schmidgall, a first-year doctoral student in electrical and computer engineering at Johns Hopkins; Anton Deguet, an assistant research engineer at Johns Hopkins; and Marin Kobilarov, an assistant professor at Johns Hopkins.



Until now, research on robots has focused on completing daily household activities, but has not been fully explored in the field of surgery, especially the da Vinci robots from surgical robotics company Intuitive Surgical. These robots have been deployed globally and have great potential for expansion: as of 2021, more than 10 million surgeries have been performed in 67 countries using 6,500 da Vinci systems, and 55,000 surgeons have been trained on the system.

The study explored whether surgical manipulation tasks could be learned on the da Vinci robot through imitation learning. To achieve this goal, they introduced a relative action formulation that can successfully train and deploy policies using approximate kinematic data. This approach allows a large amount of clinical data to be directly used for robot learning without further correction. The final robot performed well in three basic surgical tasks including tissue manipulation, needle handling, and knot tying.

This surgical robot has been well received by many netizens, who called it "incredible."



Method Overview

Figure 3 below shows the dVRK system, which includes the robot and a remote console for physician interaction. The dVRK has an endoscopic camera manipulator (ECM) and two patient-side manipulators (PSM1, PSM2), which share a robot base. Each robotic arm is composed of a sequence of passive set joints (SUJs), followed by motorized active joints.



Passive joints use only potentiometers for joint measurement, which is very inaccurate. Active joints use both potentiometers and motor encoders, which improves accuracy. However, in general, using potentiometers in all joints will result in inaccurate forward motion of the robot arm, with errors of up to 5 cm.

The researchers' goal is to learn surgical manipulation tasks through imitation learning. Considering the inaccurate forward motion of the robot, choosing an appropriate action representation is crucial. Therefore, they studied three action representations, namelyCamera-centric, Tool-centric, and Hybrid-Relative, as shown in Figure 4 below.



The camera-centric approach serves as a baseline, highlighting the limitations of modeling the motion as the absolute pose of the end effector. The tool-centric approach improves upon this approach by modeling the motion as relative motion and increases the success rate. The hybrid relative approach further improves upon the tool-centric approach by modeling the translational motion of a fixed reference frame and increases the accuracy of the translational motion.

Let’s first look at camera-centric motion. The researchers modeled camera-centric motion as the absolute pose of the end effector relative to the top frame of the endoscope. This setup is similar to how position-based visual servoing applications (PBVS) are implemented, making it a natural choice for dVRK.

Specifically, given an observation o_t at time t, the goal is to learn a policy π and predict the action sequence A_t,C = (a_t, ..., a_t+C), where C represents the action prediction range. The policy is defined as follows



The second is tool-centric action. The researchers modeled tool-centric action as relative motion relative to the current end-effector frame (i.e., the moving body frame). Therefore, the tool-centric action is defined as follows:



Finally, there is the mixed relative motion. Similar to the tool-centric motion, the researchers modeled the mixed relative motion as relative motion relative to two different reference frames. The incremental translation is defined relative to the endoscope top frame, and the incremental rotation is defined relative to the current end effector frame. The mixed relative motion is defined as follows



Experimental Results

During data collection, the robot was set up in the configuration shown in Figure 5 below. In this configuration, the researchers collected 224 tissue lifting trials, 250 needle picking and handing trials, and 500 knot tying trials. All trials were collected by a single user over multiple days.

Assessing the consistency of relative motion with absolute forward motion. The researchers sought to understand whether the relative motion of dVRK exhibits stronger consistency than the absolute forward motion. To test this hypothesis, they remotely manipulated reference trajectories, such as the infinity symbol shown in Figure 5.

The end effector was then placed in the same initial pose and the trajectory was reproduced using different motion representations under different robot configurations. These different configurations consisted of moving the robot’s workspace to the left and to the right. Of course, these workspace movements resulted in movement of the robot’s setup joints, and since only potentiometers were used for joint measurement, these joints were prone to large measurement errors.



The numerical results of the root mean square error (RMSE) are shown in Table 1. In the reference configuration, since the joints are not moved, all motion representations accurately reconstruct the reference trajectory.

Furthermore, in terms of relative motion representation, the reference trajectories of the tool-centric and hybrid relative methods are more consistent over repetitions, and the numerical errors do not vary significantly. In summary, in the presence of inconsistent joint measurement errors, the relative motion of dVRK behaves more consistently than its absolute forward motion.



Next, the researchers evaluated the policy performance using different action representations such as lifting tissue, picking up and handing over needles, and tying knots. The results are shown in Table 2 below. The camera-centric action representation performs poorly in all three tasks.



The tool-centric action representation showed better performance in all three tasks. However, during the needle picking and handover process, the handover often failed when large rotation operations were performed. In particular, after picking up the needle, the left gripper must rotate about 90 degrees to transfer the needle to the other robot arm, as shown in Figure 6 below. In this motion phase, the orientation of the gripper seems to be correct, but the translation motion seems incorrect, which may be the reason for the failure.