news

The black box is opened! Transformer visual explanation tool, run GPT-2 locally

2024-08-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

It’s 2024, and you still don’t know how Transformers work? Try this interactive tool.

In 2017, Google proposed the Transformer in the paper “Attention is all you need”, which became a major breakthrough in the field of deep learning. The paper has been cited nearly 130,000 times, and all subsequent GPT family models are also based on the Transformer architecture, which shows its wide influence.

As a neural network architecture, Transformer is popular in a variety of tasks from text to vision, especially in the current hot field of AI chatbots.

However, for many non-professionals, the inner workings of Transformer remain opaque, hindering their understanding and participation. Therefore, it is particularly necessary to unveil the mystery of this architecture. However, many blogs, video tutorials, and 3D visualizations often emphasize the complexity of mathematics and model implementation, which may make beginners feel at a loss. At the same time, visualizations designed for AI practitioners focus on neurons and layer interpretability, which is challenging for non-professionals.

So, several researchers from Georgia Institute of Technology and IBM Research developedA web-based open source interactive visualization tool "Transformer Explainer" helps non-professionals understand the high-level model structure and low-level mathematical operations of Transformer. As shown in Figure 1 below.

Transformer Explainer explains the inner workings of Transformer through text generation, usingSankey diagram visualization design, inspired by recent work that views Transformers as dynamic systems, emphasizing how input data flows through model components. From the results, the Sankey diagram effectively illustrates how information is passed through the model and shows how input is processed and transformed by Transformer operations.

In terms of content, Transformer Explainer tightly integrates a model overview that summarizes the Transformer structure and allows users to smoothly transition between multiple levels of abstraction to visualize the interaction between low-level mathematical operations and high-level model structures, helping them to fully understand the complex concepts in Transformer.

In terms of functionality, Transformer Explainer not only provides a web-based implementation, but also has real-time reasoning capabilities. Unlike many existing tools that require custom software installation or lack reasoning capabilities, it integrates a real-time GPT-2 model that runs locally in the browser using a modern front-end framework. Users can interactively experiment with their input text and observe in real time how the internal components and parameters of the Transformer work together to predict the next token.

In a sense, Transformer Explainer expands access to modern generative AI techniques without requiring advanced computing resources, installation, or programming skills. GPT-2 was chosen because of its high popularity, fast inference speed, and similar architecture to more advanced models such as GPT-3 and GPT-4.

Paper address: https://arxiv.org/pdf/2408.04619

GitHub address: http://poloclub.github.io/transformer-explainer/

Online experience address: https://t.co/jyBlJTMa7m



视频链接:https://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==∣=2650929831&idx=1&sn=d0e5c01537def9f92c64dda2ea3c6626&chksm=84e43ed9b393b7cf177414848deaed70ac2a5b1522a12e3399920d4862e398c113b96af7b76e&token=522918026⟨=zh_CN#rd

Since it supports self-input, Synced also tried "what a beautiful day" and the results are shown in the figure below.

Many netizens gave high praise to Transformer Explainer, and some said it was a very cool interactive tool.

Some people say that they have been waiting for an intuitive tool to explain self-attention and position encoding, which is Transformer Explainer. It will be a game-changing tool.

Someone also made a Chinese translation.

Display address: http://llm-viz-cn.iiiai.com/llm

Here I can't help but think of another popular science expert, Karpathy, who has written many tutorials on reproducing GPT-2, including "GPT-2 by hand in pure C language, the new project of former OpenAI and Tesla executives has become popular", "Karpathy's latest four-hour video tutorial: Reproducing GPT-2 from scratch, running it overnight will get it done", etc. Now that there is a visualization tool for the internal principles of Transformer, it seems that the learning effect will be better when the two are used together.

Transformer Explainer System Design and Implementation

The Transformer Explainer visualization shows how the Transformer-based GPT-2 model is trained to process text input and predict the next token. The front-end uses Svelte and D3 to achieve interactive visualization, and the back-end uses ONNX runtime and HuggingFace's Transformers library to run the GPT-2 model in the browser.

A major challenge in designing Transformer Explainer is how to manage the complexity of the underlying architecture, because showing all the details at the same time can make people lose focus. To solve this problem, the researchers paid close attention to two key design principles.

First, the researchers reduced complexity through multiple levels of abstraction. They structured the tool to present information at different levels of abstraction. This allows users to start with a high-level overview and gradually drill down into details as needed, thus avoiding information overload. At the highest level, the tool shows the complete processing flow: from receiving user-provided text as input (Figure 1A), embedding it, processing it through multiple Transformer blocks, and then using the processed data to rank the most likely next token predictions.

Intermediate operations, such as the calculation of the attention matrix (Figure 1C), which are collapsed by default to visually display the importance of the calculation results, can be expanded by the user to view their derivation process through an animated sequence. The researchers adopted a consistent visual language, such as stacking attention heads and collapsing repeated Transformer blocks, to help users identify repeated patterns in the architecture while maintaining the end-to-end flow of the data.

Second, researchers can enhance understanding and engagement through interactivity. The temperature parameter is crucial in controlling the output probability distribution of the Transformer, which affects the certainty (at low temperatures) or randomness (at high temperatures) of the next token prediction. However, existing educational resources on Transformers often overlook this aspect. Users can now use this new tool to adjust the temperature parameter in real time (Figure 1B) and visualize its key role in controlling prediction certainty (Figure 2).

In addition, users can select from the provided examples or enter their own text (Figure 1A). Supporting custom input text allows for deeper user engagement by analyzing the model’s behavior under different conditions and interactively testing their own hypotheses based on different text inputs.

So what are the actual application scenarios?

Professor Rousseau, who is modernizing the course content of her natural language processing course to highlight recent advances in generative AI, has noticed that some students view Transformer-based models as elusive “magic,” while others want to understand how these models work but are unsure where to start.

To address this, she introduced students to Transformer Explainer, a tool that provides an interactive overview of Transformer (Figure 1) and encourages students to actively experiment and learn. Her class has more than 300 students, and the fact that Transformer Explainer runs entirely in students’ browsers, without requiring software or special hardware installation, is a significant advantage, eliminating students’ concerns about managing software or hardware setup.

The tool introduces students to complex mathematical operations, such as attention calculations, through animated and interactive reversible abstractions (Figure 1C). This approach helps students gain both a high-level understanding of the operations and a deep understanding of the underlying details that produce those results.

Professor Rousseau also realizes that the technical capabilities and limitations of the Transformer are sometimes anthropomorphized (e.g., viewing the temperature parameter as a “creativity” control). By encouraging students to experiment with the temperature slider (Figure 1B), she shows how the temperature actually modifies the probability distribution of the next token (Figure 2), thereby controlling the randomness of the prediction, striking a balance between determinism and more creative output.

Furthermore, when the system visualizes the token processing flow, students can see that there is no so-called “magic” here — no matter what the input text is (Figure 1A), the model follows a well-defined order of operations, using the Transformer architecture, sampling only one token at a time, and then repeating the process.

Future Work

The researchers are enhancing the tool's interactive explanations to improve the learning experience. At the same time, they are also improving inference speed through WebGPU and reducing the size of the model through compression techniques. They also plan to conduct user studies to evaluate the performance and usability of Transformer Explainer, observe how AI novices, students, educators, and practitioners use the tool, and collect feedback on additional features they would like to support.

What are you waiting for? Try it yourself, break the "magic" fantasy of Transformer, and truly understand the principles behind it.