news

Google finally beats OpenAI: experimental version Gemini 1.5 Pro surpasses GPT-4o

2024-08-02

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Machine Heart Report

Editor: Chen Chen, Xiao Zhou

Google provides everyone with free trial of such a powerful model.

In the past two days, Google has been continuously releasing the latest research. After releasing the most powerful Gemma 2 2B small model yesterday, the Gemini 1.5 Pro experimental version (0801) has just been launched.

Users can conduct testing and provide feedback through Google AI Studio and Gemini API.

Since it’s free, let’s help you test the popular comparison problem. When we asked Gemini 1.5 Pro (0801) which number is bigger, 9.9 or 9.11, the model answered correctly in one try and gave the reason.



When we continued to ask "How many r's are there in the word Strawberry", however, Gemini 1.5 Pro (0801) failed. We cast a "spell" in the prompt and followed it step by step. The model analysis went wrong at the fourth step.



Google AI Studio test address: https://aistudio.google.com/app/prompts/new_chat

However, judging by the official reviews, the Gemini 1.5 Pro (0801) is still very capable in all indicators. The new model quickly topped the famous LMSYS Chatbot Arena leaderboard and has an impressive ELO score of 1300.

This achievement puts Gemini 1.5 Pro (0801) ahead of OpenAI’s GPT-4o(ELO: 1286) and Anthropic’s Claude-3.5 Sonnet (ELO: 1271), which may indicate a shift in the AI ​​landscape.



Simon Tokumine, a key member of the Gemini team, called the Gemini 1.5 Pro (0801) the most powerful and smartest Gemini that Google has ever made.

In addition to taking the top spot in Chatbot Arena, Gemini 1.5 Pro (0801) also performed very well in multilingual tasks, mathematics, hard prompts, and coding.

Specifically, Gemini 1.5 Pro (0801) ranked first in Chinese, Japanese, German, and Russian.





But in the field of encoding and Hard Prompt, Claude 3.5 Sonnet, GPT-4o, and Llama 405B are still in the lead.





On the win-rate heat map: Gemini 1.5 Pro (0801) has a win rate of 54% against GPT-4o and a win rate of 59% against Claude-3.5-Sonnet.



Gemini 1.5 Pro (0801) also ranks first on the Vision rankings!





Netizens have expressed that this time Google really exceeded everyone's expectations by suddenly opening the test of the strongest model without any official announcement in advance, which put pressure on OpenAI.



While the Gemini 1.5 Pro (0801) achieves high results, it is still in the experimental stage. This means that the model may undergo further modifications before it is widely used.

User Reviews

Some netizens tested the content extraction ability, code generation ability, and reasoning ability of Gemini 1.5 Pro (0801). Let's take a look at his test results.



Source: https://x.com/omarsar0/status/1819162249593840110

First of all, Gemini 1.5 Pro (0801) has a powerful image information extraction function. For example, if you input an invoice image, the invoice details will be written in JSON format:



Let's take a look at the PDF document content extraction function of Gemini 1.5 Pro (0801). Take the classic paper "Attention Is All You Need" as an example to extract the chapter directory of the paper:



Let Gemini 1.5 Pro (0801) generate a Python game that helps learn knowledge of large language models (LLMs). The model directly generates a whole piece of code:





It is worth mentioning that Gemini 1.5 Pro (0801) also provides a detailed code explanation, including the functions in the code, how to play the Python game, etc.



This program can be run directly in Google AI Studio and can be played with, for example, by answering a multiple-choice question about the definition of Tokenization:



If you think the multiple-choice questions are too simple and boring, you can further let Gemini 1.5 Pro (0801) generate a more complex game:





Get an LLM Professional Knowledge Sentence Completion Game:



To test the reasoning ability of Gemini 1.5 Pro (0801), a netizen asked a "blowing out candles" question, but the model gave an incorrect answer:



Despite some flaws, Gemini 1.5 Pro (0801) does show visual capabilities close to GPT-4o, as well as code generation and PDF understanding and reasoning capabilities close to Claude 3.5 Sonnet, which is worth looking forward to.

https://www.youtube.com/watch?v=lUA9elNdpoY