news

Google's 2 billion parameter Gemma 2 surpasses GPT-3.5, and runs very fast on iPhone

2024-08-02

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


Zhidongxi (public account:zhidxcom
Compilevanilla
editLi Shuiqing

Google DeepMind's open source small model family has a new member!

According to Zhidongxi on August 1, early this morning, Google DeepMind open-sourced the lightweight model Gemma 2 2B, whose scores in the large model arena surpassed models with larger parameters such as GPT-3.5 and Llama 2 70B.


▲Gemma 2 2B

With only 2 billion parameters, Gemma 2 2B can run quickly on mobile phones, PCs and other terminal devices. Developers have tested it on Google AI Studio and achieved an inference speed of 30-40 tokens/s.


▲Developers testing Gemma 2 2B

Also launched with Gemma 2 2B is Gemma Scope, a tool for enhancing model interpretability, and ShieldGemma, a security classification model for filtering harmful content.

Gemma Scope zooms in on specific points in the model based on the Sparse Autoencoder (SAE) and optimizes it using the JumpReLU architecture, helping to parse the dense and complex information processed in the model, allowing researchers to "see" inside the model like a microscope.

ShieldGemma is built to target four hazard areas: hate speech, harassment, pornographic content, and dangerous content, and it outperforms benchmark models such as GPT-4 in response tests.

The Gemma series of models was first launched in February this year. It is an open source model built by Google DeepMind based on the experience of the Gemini model. In June, Google launched the second-generation open source model Gemma 2, including two parameter scales of 9B and 27B. The 27B model quickly jumped to the forefront of open source models in the LMSYS large model arena.

1. Beating the 35-times larger model, Gemma 2 was not a problem in terms of size

Gemma 2 2B is derived from a larger-scale model and is the third Gemma 2 model launched by Google after 27B and 9B.

As a lightweight model with only 2 billion parameters, Gemma 2 2B did not sacrifice performance for lightness. In the LMSYS Chatbot Arena ranking, Gemma 2 2B surpassed GPT-3.5 with a score of 1126, as well as Mixtral 8x7B and Llama 2 70B models with dozens of times the parameter scale.


▲Gemma 2 2B's performance in the large model arena

Some netizens tested the size comparison problem of 9.9 and 9.11, where Gemma 2 2B failed on many large models. Gemma 2 2B quickly gave the correct answer.


▲Gemma 2 2B answering questions

Running speed is a major advantage of lightweight models. How fast is it? Apple machine learning researcher Awni Hannun tested Gemma 2 2B on MLX Swift on his iPhone 15 pro, and its inference speed is visibly fast.


▲Gemma 2 2B running speed

After actual testing, developer Tom Huang said that it runs at a speed of about 30~40 tokens/s on Google AI Studio, "faster than Apple's model."

In terms of deployment, Gemma 2 2B provides flexible deployment methods and can run efficiently on a variety of hardware, including edge devices, laptops, or cloud deployment based on Vertex AI.

Developers can download Gemma 2 2B model weights on platforms such as Hugging Face and Kaggle for research and commercial applications, or try out its features in Google AI Studio.

Open source address:

https://huggingface.co/google/gemma-2-2b

two,Building a classifier for four types of content, the response rate is better than GPT-4

To improve the security and accessibility of the model, Google launched ShieldGemma, a set of security content classifier models built on Gemma 2, which is used to filter the input and output of AI models. It complements the existing security classifier suite in Google's responsible AI toolkit.


▲ShieldGemma working principle

ShieldGemma is built for four hazard areas: hate speech, harassment, pornographic content, and dangerous content, and provides a variety of model sizes to meet different needs, including 2B, 9B, and 27B. Among them, the 2B parameter model is suitable for online classification tasks, while the 9B and 27B versions are used to provide higher performance for offline applications.

In the evaluation results on external data sets, ShieldGemma surpassed baseline models such as OpenAI Mod and GPT-4.


▲ShieldGemma evaluation results

The technical report of ShieldGemma was also released simultaneously, which explained the construction method, data source and effectiveness of the model. In the response test of four types of harmful content, the response rate of ShieldGemma at three scales was better than GPT-4.


▲ShieldGemma response test

Technical report address:

https://storage.googleapis.com/deepmind-media/gemma/shieldgemma-report.pdf

three,"Microscope" inside large models, zero-code analysis of model behavior

In order to study the working principles of language models, Google launched a comprehensive and open sparse autoencoder Gemma Scope. It is like a microscope that can help researchers "see" the inside of the model and better understand its working principles.

Gemma Scope zooms in on specific points in the model using sparse autoencoders (SAEs) that help parse the dense and complex information processed in the model, expanding it into a form that is easier to analyze and understand.


▲Stylized representation of model activation using SAE interpretation

By studying these expanded views, researchers can understand how Gemma 2 recognizes patterns, processes information, and ultimately makes predictions, exploring how to build AI systems that are more understandable, reliable, and dependable.

Previously, research on SAEs has focused on studying the inner workings of a single layer in a micro or large model. The breakthrough of Gemma Scope is that it trains SAEs on the output of each layer and sublayer of the Gemma 2 model. It has generated more than 400 SAEs and learned more than 30 million features.


▲Example activation of Gemma Scope’s SAE discovery feature

Gemma Scope also uses a new JumpReLU SAE architecture for training. The original SAE architecture has a hard time balancing the two goals of detecting which features exist and estimating their strength. The JumpReLU architecture can achieve this balance more easily, significantly reducing errors.

Gemma Scope has opened up more than 400 free SAEs, covering all layers of Gemma 2 2B and 9B, and provides interactive demonstrations. Researchers can study SAE characteristics and analyze model behavior without writing code.


▲Gemma Scope interactive demonstration

Demo address:

https://www.neuronpedia.org/gemma-scope

Technical report address:

https://storage.googleapis.com/gemma-scope/gemma-scope-report.pdf

Conclusion: GenerativeAIThe wind blows towards the small model andAISafety

As generative AI has developed to date, the models have evolved from "volume" parameters and "volume" scale to "volume" lightness and "volume" security, reflecting the importance of being closer to users, lowering costs, and better meeting specific needs in the process of technology implementation.

AI PCs and AI phones are gradually entering consumers' lives. In this process, how to "stuff" large models into small terminal devices and ensure user privacy and security is an urgent problem that major AI manufacturers need to solve.