news

Grok-2 is here, capable of generating and recognizing images, with performance comparable to GPT-4o, Musk: Development is as rapid as a rocket

2024-08-14

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Machine Heart Report

Synced Editorial Department

Even though GPT-5 is not out yet, Grok has already caught up.

On the same day that Google and OpenAI were competing for the news, Musk's xAI was also not idle.

On Wednesday afternoon Beijing time, xAI officially released the new generation of Grok 2 large model.



The third-party large model benchmark organization Chatbot Arena also immediately updated the LMSYS list of results. Grok 2's early model (sus-column-r) ranked fourth after GPT-4o (version 0513), outperforming Claude 3.5 Sonnet and GPT-4-Turbo.

It excels in coding, complex problems, and mathematics.





Musk couldn't help but boast, "Grok's propulsion speed is like riding a rocket."



Note that this is only the score of an early version. Chatbot Arena said that it will test the official version later.

Musk said that Grok-2 is an advanced language model with state-of-the-art reasoning capabilities. The new generation includes two versions: Grok-2 and Grok-2 mini. Both models are now released to Grok users on the X platform. Currently, X Premium and Premium+ users can already experience these two models, Grok-2 and Grok-2 mini.

Compared with the previous Grok-1.5, the early preview version of Grok-2 has achieved significant progress and demonstrated leading capabilities in chat, reasoning, code, etc. xAI said that Grok-2 and Grok-2 mini are currently in the testing phase on X and will be provided through enterprise APIs later this month.

Less than half an hour after the new model was released, some netizens were already showing off their experience using it. They used Grok 2 mini to generate an image of "Me and Musk eating hot dogs".





Try another one to generate a portrait of Washington.



Some people also tried Grok 2 mini and generated a flying cat.



Someone was born as a Tesla Model Y, and it looks quite similar?



Grok-2 Performance Comparison

With xAI putting an early version of Grok-2, “sus-column-r”, into the Chatbot Arena, we saw how its performance compares to other popular open-source and closed-source models.

In terms of the overall Elo score, Grok-2 outperforms the Claude series models and most versions of GPT-4. Of course, the first one is GPT-4o (August 8 version) released by OpenAI these days.



The figure below shows the win rate comparison between Grok-2 and other popular models.



The following chart is a factual comparison of the win rates of Grok 1.5 and Grok 2.



xAI uses this process to evaluate the Grok 2 model, using AI Tutors to interact with the model in real life on various tasks. During each interaction, Grok 2 provides two responses to the AI ​​Tutors, and then selects the best response based on specific criteria listed in the guide.

xAI focused on evaluating model performance in two key areas: instruction following and providing accurate, factual information. The results showed that Grok 2 has made significant progress in its ability to reason with retrieved content and use tools such as correctly identifying missing information, reasoning through event sequences, and discarding irrelevant posts.

Benchmark Test Scores

xAI evaluated the Grok-2 model on a range of academic benchmarks, including reasoning, reading comprehension, mathematics, science, and coding.

Both Grok-2 and Grok-2 mini are significant improvements over the previous Grok-1.5 model. Their performance is comparable to other state-of-the-art models in areas such as graduate-level scientific knowledge (GPQA), general knowledge (MMLU, MMLU-Pro), and math competition problems (MATH).

In addition, Grok-2 also performs well in vision-based tasks, with significant performance in visual mathematical reasoning (MathVista) and document-based question answering (DocVQA).



Grok 2 interface and functions "big transformation"

Over the past few months, xAI has been continuously improving the Grok experience on the x platform. Now, with the launch of the next generation Grok 2, xAI has redesigned the interface as shown below.



Of course, xAI provides some new features, such as a simple implementation of Conway's Game of Life.



Another example is multimodal comprehension ability (describing pictures).



Among them, Grok-2 is xAI's most advanced AI assistant, with text and visual understanding capabilities, and integrates real-time information from the X platform, which can be accessed through the Grok tab in the X application.

The Grok-2 mini is a small but powerful model that strikes a good balance between speed and answer quality.



More intuitive, controllable, and flexible than its predecessor, Grok-2 is suitable for a variety of tasks, whether you are searching for answers, collaborating on writing, or solving coding tasks.

Additionally, xAI is working with startup Black Forest Labs to trial their FLUX.1 model to extend Grok’s capabilities on X.



Later this month, xAI will also release Grok-2 and Grok-2 mini to developers through a new enterprise API platform. The upcoming API is built on a new custom technology stack that allows multi-region inference deployments for global low-latency access.

Of course, xAI also offers some enhanced security features, such as mandatory multi-factor authentication (e.g. using Yubikey, Apple TouchID, or TOTP).

As you can see, xAI has been advancing this series of models at an astonishing pace since the launch of Grok-1 in November 2023. Soon, they will release a preview version with multimodal understanding. The focus of xAI afterwards will be to improve the core reasoning capabilities of the model through new computing clusters.

Blog address: https://x.ai/blog/grok-2