news

Fierce price war, AI small models soar

2024-07-30

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

The explosion of small AI models has become a new arena for AI giants to compete with each other.

The price war for large models is "intensifying", and AI companies that have invested huge amounts of money are in urgent need of taking their business stories one step further. Recently, they have launched their own low-cost, easy-to-deploy small models, and a new round of competition has begun.

First, HuggingFace released SmolLM - 135M, 360M and 1.7B, which used only 650B tokens for training, but its performance exceeded Qwen 1.5B and Phi 1.5B.

The next day, Mistral AI and NVIDIA jointly released Mistral NeMo, which is called "Mistral AI's best small model". It is easy to use and can directly replace any system using Mistral 7B.

On the same day, OpenAI launched a mini version of GPT-4o - GPT-4o Mini, calling the new model "the most powerful and cost-effective small-parameter model" and using it as a display model, replacing the web version of GPT-3.5 in the "front" position.

Apple was not to be outdone and released the DCLM model on the same day as OpenAI, and made it open source immediately after release. Vaishaal Shankar, a research scientist in Apple's ML team, said, "This is the best performing truly open source model to date."

These models have small parameters and take up little memory. In specific scenarios, after fine-tuning, their performance can be comparable to that of large models, making them a cost-effective choice.

"Small models are definitely easier to realize value." Xu Xiaotian, chief architect of data and artificial intelligence at IBM China, said in an interview with a reporter from 21st Century Business Herald: "A group of professional small models cooperate with agents to realize the integration of business flows, which will be more feasible in terms of functionality and economy."

In the AI ​​generative model battlefield, the iteration speed is extremely fast. Today's "best" may be defeated by tomorrow's new version, and "historical records" are constantly being overturned and rewritten. "The model is updated too quickly, and it is difficult to judge. One manufacturer claims that it is the 'biggest and best', and the next one claims that it is the 'smallest and best'." A senior observer of the artificial intelligence industry told the 21st Century Business Herald reporter that there are so many artificial intelligence models that AI companies must work ten or a hundred times harder to run a business story.


Image source: Creative Graphics Xu Shuxing


Small model track opens

AI giants are intensively releasing small models, competing not only in performance but also in price.

According to the Open AI official website, in benchmark tests such as MMLU, MGSM, HumanEval, and MMMU, GPT-4o mini has demonstrated better text and visual reasoning, mathematical reasoning, encoding, and multimodal reasoning capabilities than GPT-3.5 Turbo, Gemini Flash, Claude Haiku, and other small models. In particular, its mathematical reasoning and encoding capabilities far outperform GPT-3.5 Turbo and other small models, but are slightly weaker than GPT-4o. In the latest LMSYS blind test arena ranking, GPT-4o mini also achieved a good result of tying with GPT-4o for first place. Even OpenAI CEO Sam Altman could not hide his excitement and posted on the social media platform, "We have never been so excited about any evaluation."

In addition to excellent performance, OpenAI also brought out the low price killer. When it went online on July 18, OpenAI announced that the price of GPT-4o mini was 15 cents per million input tokens and 60 cents per million output tokens, which is more than 60% cheaper than GPT-3.5 Turbo. On July 24, OpenAI announced again that from now until September 23, it will provide GPT-4o mini fine-tuning services for level 4 and level 5 users free of charge, with a limit of 2 million tokens per day, and the excess will be charged at US$3 per 1 million tokens. OpenAI said: "We expect GPT-4o mini to expand the application scope of artificial intelligence and make artificial intelligence more affordable."

Ping An Securities Research Report believes that GPT-4o mini is a new generation of entry-level AI "small model" with significantly lower prices, combining performance and cost-effectiveness. Currently, large models around the world are gradually showing a trend of shifting from unilateral performance competition to a development trend of giving equal importance to performance and practicality. When the capabilities of large models reach a certain level, they will inevitably move towards application. Large model manufacturers are expected to accelerate the formation of a commercial closed loop in the large model industry chain by improving the cost-effectiveness of their products and promoting the promotion and deployment of downstream applications.

Apple's DCLM model, released after GPT-4o mini, is also eye-catching. DCLM has fully open-sourced its code, weights, training process and data sets. DCLM is available in two sizes: 1.4 billion and 7 billion parameters. The 7 billion parameter version surpasses Mistral-7B and its performance is close to Llama 3 and Gemma. In the MMLU (5-shot) benchmark, DCLM-7B has an accuracy of 63.7%. According to researchers, this performance is 6.6% higher than the previous most advanced open data language model MAP-Neo and reduces the amount of computation by 40%. More importantly, this result surpasses Mistral-7B-v0.3, which has an accuracy of 62.7%, and is close to Gemma 8B, which has an accuracy of 64.3%, Llama3 8B, which has an accuracy of 66.2%, and Phi-3 7B, which has an accuracy of 69.9%.

Rather than "bigger is better", Apple prefers to go the small model route. In April this year, Apple announced four pre-trained large models in the small model family, OpenELM, which were extremely small in size and were already moving towards the goal of "allowing artificial intelligence to run locally on Apple devices".

In June, Apple revealed its AI development roadmap, planning to embed small models into mobile devices smoothly. This will not only achieve the goal of "faster and safer", but also solve the ultimate problem of integrating mobile devices and models in one fell swoop.

Mistral NeMo was built by Mistral AI in collaboration with NVIDIA. After advanced fine-tuning and alignment stages, the model excels in following precise instructions, reasoning, handling multi-round conversations, and generating code. It is understood that Mistral NeMo is mainly aimed at enterprise environments, with the goal of allowing enterprises to implement artificial intelligence solutions without the need for a large amount of cloud resources.

In an interview with Venturebeat, Bryan Catanzaro, Nvidia’s vice president of applied deep learning research, elaborated on the advantages of small models. He said: “Small models are easier to acquire and run, and can have different business models because people can run them at home on their own systems.”

As big models enter the second half, from technology to application, the market is increasingly interested in high-efficiency, low-cost models that are easier to deploy locally. This reflects the desire for security, privacy, high efficiency, and high cost-effectiveness.

Industry analysts believe that there is a clear new trend in AI deployment, that is, models that can run efficiently on local hardware are dispelling the concerns of many companies about large-scale adoption of AI solutions, such as data privacy, latency, and high costs. "This may make competition more fair, and small companies with limited resources will also be blessed with AI models, thus bridging the inherent gap between them and large companies."


Behind the entry into the small model track

Why are AI giants opening up the small model track? Part of the reason may be cost considerations.

Large models are expensive to develop and run, making them unaffordable even for giants like OpenAI.

Recently, an insider analyzed that "OpenAI may lose $5 billion this year and faces the risk of running out of funds within 12 months". As of March this year, OpenAl spent nearly $4 billion to rent Microsoft's servers to run ChatGPT and its underlying large language model (LLM). In addition to running ChatGPT, OpenAl's training costs, including data fees, may soar to $3 billion this year. According to people familiar with the matter, last year, OpenAl accelerated the training of new AIs faster than originally planned. OpenAl originally planned to spend about $800 million on such costs, but ended up spending much more.

In contrast, small models are low-cost, fast-response, and can be run locally, making them more adaptable to personalized and precise usage needs. Industry insiders said: "Under the premise that global AI hardware is in short supply, small models mean lower deployment and training costs, and their output effect is sufficient to cope with some specific tasks."

A business manager of a domestic AI company told the 21st Century Business Herald reporter that a small parameter scale can significantly save inference costs. The hardware cost required for model training and adjustment is much lower than that of large models. Mature developers can even train vertical models at low cost. The costs of these operations are much lower than those of large models.

Andrej Karpathy, a founding member of OpenAI and former senior director of AI at Tesla, made a representative prediction recently. He suggested that the size competition of generative models will be reversed, and the competition will be about whose models are smaller and smarter.

In Andrej Karpathy's explanation, the current large models are so large because they are still very wasteful during training. Although large models are very good at memory, this also means that large models remember a lot of irrelevant details that should not be called repeatedly in a specific problem.

For small models, the training objectives become simpler, more direct, and more efficient, allowing AI to learn more useful information more directly.

However, the large model and the small model are not a "choice either or" situation, and their development paths still have the significance of mutual reference.

Andrej Karpathy said: "The model must become larger before it can become smaller. Because we need a large model to reconstruct and shape the data into an ideal form. One model helps generate training data for the next model, gradually obtaining a perfect training set, and then feeding it to the small model, which does not need to fully remember all the knowledge, but only occasionally needs to look up something to ensure accuracy."

Robin Li also said at Baidu AI Developer Conference Create 2024 that in the future, large-scale native AI applications will basically be based on the Moe architecture, that is, a mixture of large and small models. Robin Li also said that compressing and distilling a basic model from a large model and then using data to train it is much better than training a small model from scratch, and is better, faster, and cheaper than models trained based on open source models.