news

Scan wherever you don’t know! Play the world’s most powerful mathematical model online, supported by Alibaba’s multimodal model

2024-08-20

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Hengyu from Aofei Temple
Quantum Bit | Public Account QbitAI

Now, everyone can play with the most powerful mathematical model!

When I woke up, the Alibaba Qianwen Big Model team released the demo of Qwen2-Math.Hug Face is playable online

The surprise is that if you find it troublesome to enter mathematical formulas,You can take a screenshot or scan the question you want to ask, upload it and you can solve it.

It's quite convenient.



The trial interface clearly states, "The OCR function of this trial interface is supported by Qwen2-VL, the Alibaba Qianwen big model team; the mathematical reasoning ability is supported by Qwen2-Math."

Alibaba senior algorithm expert Lin Junyang also further explained in the Twitter comment section:

At present, Qwen2-VL and Qwen2-Math are still responsible for a part respectively.
But in the near future, we will combine multimodal capabilities and mathematical reasoning capabilities into one modelCome on.



Many netizens are quite satisfied with this interactive mode:

Wai Ruigude! Upload it as an image, and then wait for the big model to solve the problem, like it!



So, how is the most powerful math model Qwen2-Math?

How is the effect? ​​Let's play it now

It's time for Qwen2-Math to overcome all obstacles!

Let’s start with a few simple math problems to whet your appetite.

I would like to point out in advance that during your experience, Qwen2-Math does not display the results while calculating, but directly displays the process and results after the calculation is completed.

(And it should be that more and more people started playing, and the generation time gradually lengthened)

Question 1:The value of A in the calculation of AxA+A=240.

Qwen2-Math gives the correct answer, A=14 or A=-16.



Question 2:Given a value for a, calculate the result of the equation.

Qwen2-Math calculated that the answer is 0, which is also correct.



Question 3:(A+3)(A+4)(A+5)=120, find the value of A.

Bingo! The answer is 1.



OK, warm-up is over, let’s give Qwen2-Math some difficulty.

Then here is a standard question for evaluating large (mathematical) models:

Which is bigger, 9.9 or 9.11?



Qwen2-Math confidently answered:

9.9 is bigger than 9.11!



Then make it a little more difficult!

Give it a question that only GPT-4o has answered correctly so far:

An alien arriving on Earth might choose to do one of the following four things:
1. Self-destruction;
2. Split into two aliens;
3. Split into three aliens;
4. Do nothing.
Every day thereafter, each alien will make a choice independently of each other.
Find the probability that there will be no aliens on Earth in the end.

Qwen2-Math took about 30 seconds to answer this question: 1.

Unfortunately, the answer is wrong. The correct answer is √2 minus 1.



We browsed the comment sections of major platforms and found that in addition to calculation errors, there is another possible cause of incorrect answers:

That is Qwen2-VLWhen identifying the question, the error itself occurred.

The mistake is in the first step, so the answer obtained by the big model is definitely not the correct answer.



At the same time, Lin Junyang also said in the netizens’ comment area:

Our Qwen2-MathCan't do geometry problems yet



You can also ask questions in Chinese

The protagonist of this time, Qwen2-Math, was developed based on the Tongyi Qianwen open source large language model Qwen2 and was released by the Alibaba Qianwen large model team ten days ago.

It is dedicated to math problem solving and is capable of solving competition level exam questions.

Qwen2-Math has three parameter versions:

72B, 7B and 1.5B.



Based on Qwen2-Math-72B, the Qianwen team also fine-tuned the Instruct version.

This is also the flagship model of Qwen2-Math. It is a reward model dedicated to mathematics. It combines the reward signal with the correct or incorrect judgment signal as the learning label, and then constructs the supervised fine-tuning (SFT) data through rejection sampling. Finally, it uses the GRPO method to optimize based on the SFT model.

Qwen2-Math-72B-Instruct handled a variety of math problems such as algebra, geometry, counting and probability, and number theory with an accuracy rate of 84%.

As soon as it was released, it "ascended the throne" in the mathematical big model, and scored 7 points more than GPT-4o on the MATH dataset, which is 9.6% higher in proportion.

It directly surpasses the open source Llama 3.1-405B and closed source GPT-4o, Claude 3.5, etc.



As of press time, the download volume of Qwen2-Math-72B-Instruct on Baobaomian has exceeded 13.2k.

And here’s a new discovery:

Although the team claims that Qwen2-Math is currently mainly aimed at English scenarios,If you ask it a Chinese question, Qwen2-Math can still answer it.

I'm just answering you in English.

It is understood that Qwen2-MathChinese and English versions will be released later

Reference Links:
[1]https://huggingface.co/spaces/Qwen/Qwen2-Math-Demo
[2]https://x.com/Alibaba_Qwen/status/1825559009497317406
[3]https://x.com/JustinLin610/status/1825559557411860649