Peking University School of Computer Science is published in the top international AI journal! Professor Zhang Ming's team trained a basic model of biological activity with 1.6 million data

2024-08-16

The article is reproduced from New Wisdom

Small molecule bioactivity plays a crucial role in drug development.

Biological activity reflects the extent to which a small molecule interacts with a specific target (such as a protein, receptor or enzyme) in a biological system and induces a measurable biological response. It is a key indicator for screening potential drug candidates, optimizing molecular structure, and predicting drug efficacy and safety.

Accurate prediction and evaluation of biological activity can not only significantly shorten drug screening time and reduce R&D costs, but also help researchers understand the mechanism of drug action, thereby accelerating the development of new drugs and bringing more effective and safer treatment options to patients.

In the field of bioactivity prediction, existing physics-based computational methods such as free energy perturbation (FEP) can give accurate predictions, but they face the problem of high computational costs.

In recent years, deep learning methods have shown great potential, but face the problems of limited experimental data and incompatibility of biological activities measured by different experiments.

In the past, researchers used advanced machine learning techniques such as transfer learning, multi-task learning, and meta-learning. However, they only trained on biological activity data of specific types (such as Ki, Kd, IC50) and units of molar concentration, making it difficult for the model to generalize to biological activity prediction tasks with never-before-seen types (such as EC50) or units (such as '%').

To address this challenge, Professor Zhang Ming's team from the School of Computer Science at Peking University, together with Assistant Professor Wang Sheng, Postdoctoral Fellow Xiao Zhiping from the University of Washington, and Professor Xu Yinghui from Fudan University, proposed a basic bioactivity model - ActFound, which was trained with 1.6 million experimentally measured bioactivity data in the ChEMBL database.

At present, this work has been published in the world's top artificial intelligence journal Nature Machine Intelligence (NMI for short, with the latest impact factor of 18.8).

Paper link: https://www.nature.com/articles/s42256-024-00876-w

Share link: https://rdcu.be/dQUav

ActFound's training data, code, and models are open source: https://github.com/BFeng14/ActFound

The core idea of ActFound is to use a paired learning method to learn the relative biological activity differences between two small molecules in the same set of experiments, thereby avoiding the incompatibility of biological activities between different experiments. The model also uses meta-learning technology to help the model improve its prediction accuracy with only a small amount of data.

The reviewers believe that the combination of pairwise learning and meta-learning not only successfully solves the core problem of activity prediction, but also inspires the development of other fields.

On six bioactivity evaluation benchmark datasets, ActFound demonstrated accurate prediction capabilities and strong generalization capabilities across different bioactivity types and molecular skeletons.

The research also shows that ActFound can serve as an alternative to the leading physics-based computing tool FEP+, achieving comparable performance with only fine-tuning using a small amount of data.

To verify the performance and practical value of the ActFound model, the research team conducted a series of experiments on biological activity prediction tasks.

First, the researchers evaluated the performance of ActFound on six different datasets. ActFound outperformed all nine compared methods on ChEMBL, BindingDB, FS-Mol, pQSAR-ChEMBL, Davis, and Kiba, demonstrating its wide applicability in almost all types of experiments.

In terms of cross-domain bioactivity prediction, ActFound also surpasses the existing state-of-the-art methods and demonstrates good generalization capabilities on different types of bioactivity data.

Secondly, the research team compared ActFound with the free energy perturbation (FEP) computational tool to demonstrate the practical value of ActFound in lead small molecule optimization.

Experimental results show that ActFound has the potential to serve as an alternative tool to FEP+.

Specifically, ActFound outperformed FEP+ by using only an average of 4.8 molecules for fine-tuning. ActFound can predict the activity of more than 10,000 compounds within one second, while FEP requires 24-48 GPU hours to calculate the relative activity difference of a pair of molecules.

Finally, the researchers demonstrated the excellent performance of a cancer drug response prediction model pre-trained using ActFound.

Experimental results show that the cancer drug response prediction model initialized with ActFound has excellent performance without fine-tuning, further demonstrating the wide application potential of ActFound.

In general, these experimental results demonstrate that ActFound, as a bioactivity-based model, not only performs well in various bioactivity prediction tasks, but also shows application prospects in other aspects of drug development and discovery.

These findings provide an effective solution to the limitations of existing bioactivity prediction methods and also open up new possibilities for accelerating the drug development process.

About the Author

The first author of the paper, Feng Bin, received his master's degree from the School of Computer Science at Peking University, and his supervisor was Professor Zhang Ming. Wang Sheng and Xiao Zhiping are also alumni of the Department of Computer Science at the School of Information Science at Peking University, and have collaborated with Professor Zhang Ming's team for many years. Other members of the Peking University team include doctoral student Liu Zequn and master's student Srbuhi Mirzoyan.

All authors are Bin Feng, Zequn Liu, Nanlan Huang, Zhiping Xiao#, Haomiao Zhang, Srbuhi Mirzoyan, Hanwen Xu, Jiaran Hao, Yinghui Xu, Ming Zhang#, Sheng Wang# (those marked with # are corresponding authors).

References:

https://www.nature.com/articles/s42256-024-00876-w

news

Peking University School of Computer Science is published in the top international AI journal! Professor Zhang Ming's team trained a basic model of biological activity with 1.6 million data

Introduction

My contact information