Another breakthrough in AI! Using AI to understand AI, MIT launches multimodal automatic interpretable agent MAIA

2024-08-02

Written by | Ma Xuewei

From using drugs to stimulate the brain in "Lucy" to invading brain space with electronic interference in cyberpunk culture, humans have had many fantasies about the possibility of manipulating the human brain. Imagine,What would happen if humans could actually directly manipulate every neuron in the human brain?

By then, humans will be able to directly understand the role of these neurons in perceiving specific objects, and hopefully do some very "science fiction" things.

In real life, such experiments are almost impossible to implement in the human brain, but they are feasible in artificial neural networks. However, since the models often contain millions of neurons, they are too large and complex, and understanding them requires a lot of manpower, which makes large-scale model understanding an extremely challenging task.

To this end, a research team from the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) launched a system called MAIA, which uses neural models to automatically perform model understanding tasks.Multimodal Automatic Interpretable Agents”。

MAIA uses pre-trained visual language models to automate the task of understanding neural models. The modular design enables MAIA toFlexibility to evaluate any system and easily add new experimental toolsIn addition, it can automate complex experiments, use an iterative experimental approach to test hypotheses, and update hypotheses based on experimental results.

Jacob Steinhardt, an assistant professor at the University of California, Berkeley, believes that scaling up these methods may be one of the most important ways to understand and safely supervise artificial intelligence systems. However, the research team believes that Enhanced MAIA will not replace human supervision of AI systemsMAIA still requires human supervision to detect errors such as confirmation bias and image generation/editing failures.

What is the real effect?

While existing automated interpretability approaches simply label or visualize data once and for all, MAIA is able to generate hypotheses, design experiments to test them, and refine its understanding through iterative analysis. By combining a pre-trained visual-language model (VLM) with a library of interpretability tools, this multimodal approach can respond to user queries by writing and running directed experiments on specific models, continuously improving its approach until it can provide comprehensive answers.

At the core of the MAIA framework is an agent driven by a pre-trained multimodal model (such as GPT-4V) that is able to automatically perform experiments to explain the behavior of other systems. It does this by composing interpretability subroutines into Python programs.

Figure | MAIA architecture

The research team evaluated MAIA on the neuron description paradigm. The study showed that MAIA achieved excellent description results on both real models and synthetic neuron datasets, with predictive capabilities that were better than baseline methods and comparable to human experts.

Figure | Evaluating MAIA Description

In addition, MAIA shows good application potential in removing spurious features and revealing biases, which can help human users better understand model behavior and improve model performance and fairness.

Removing spurious features with MAIA

False features affect the robustness of the model in real-world scenarios. MAIA can identify and remove false features in the model, thereby improving the robustness of the model. The research team used ResNet-18 to train on the Spawrious dataset, which contains four dog breeds from different backgrounds.

In the dataset, each dog breed is spuriously associated with a specific background (e.g., snow, jungle, desert, beach), while in the test set, the breed-background pairings are scrambled. The research team used MAIA to find a subset of final-layer neurons that can robustly predict individual dog breeds independent of the spurious features, simply by changing the query in the user prompt.

The results show that MAIA can effectively remove spurious features from the model, thereby improving the robustness of the model.

Uncovering Bias with MAIA

Models can be biased, causing them to perform poorly in certain situations. MAIA can automatically reveal bias in models. The research team used ResNet-152 trained on ImageNet and used MAIA to check for bias in the model output.

During the experiment, MAIA was prompted to generate images related to specific categories and observe the model's response to these images. After that, MAIA found that some models had a preference for images of specific subcategories or related to specific categories.

This suggests that MAIA can help identify biases in models and thus improve them.

Figure｜MAIA model bias detection

Shortcomings and Prospects

Although MAIA shows great potential in automatic explainability, it still has some limitations.

First, MAIA's interpretability is limited by the tools it uses, such as Stable Diffusion and GPT-4. The limitations of these tools (e.g., image generation quality, cost, access restrictions) directly affect MAIA's performance. In the future, we can consider developing more powerful internal tools or looking for open source alternatives to improve the reliability and accessibility of the system.

Secondly, MAIA’s explanation is not formally verified, but based on experimental results and natural language descriptions. This may lead to biased or misleading explanations. In the future, it is possible to consider incorporating formal verification methods (such as causal reasoning and theoretical analysis) into MAIA to improve the accuracy and reliability of explanations.

In addition, MAIA cannot completely avoid common errors, such as confirmation bias, over-interpretation, small sample conclusions, etc. In the future, it is possible to consider introducing a self-reflection mechanism to enable MAIA to identify and correct its own errors and improve the robustness of the explanation.

Looking ahead, Rott Shaham, co-author of the paper, said: "I think the natural next step for our lab is to move beyond artificial systems and apply these similar experiments to human perception. Traditionally, this requires manually designing and testing stimuli, which is a labor-intensive process. With our agent, we can scale up this process and design and test large numbers of stimuli simultaneously."

news