news

Microsoft and Nvidia are both betting on small models. Are large models no longer attractive?

2024-08-26

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

On the road to the development of artificial intelligence, technology giants once competed to develop large-scale language models, but now a new trend has emerged: small language models (SLMs) are gradually emerging, challenging the past concept of "bigger is better".

Visual China

On August 21, local time, Microsoft and NVIDIA successively released the latest small language models - Phi-3.5-mini-instruct and Mistral- NeMo- Minitron8B. ​​The main selling point of these two models is that they achieve a good balance between computing resource usage and functional performance. In some aspects, their performance is even comparable to that of large models.

Clem Delangue, CEO of AI startup Hugging Face, pointed out that up to 99% of usage scenarios can be solved by SLM, and predicted that 2024 will be the year of SLM. According to incomplete statistics, technology giants including Meta, Microsoft, and Google have released 9 small models this year.

The cost of training large models is rising

The rise of SLM is not accidental, but is closely related to the challenges of large models (LLM) in performance improvement and resource consumption.

Performance comparisons published by AI startups Vellum and Hugging Face in April this year show that the performance gap between LLMs is narrowing rapidly, especially in specific tasks such as multiple-choice questions, reasoning, and math problems, where the differences between top models are minimal. For example, in multiple-choice questions, Claude 3 Opus, GPT-4, and Gemini Ultra all have an accuracy rate of over 83%, while in reasoning tasks, Claude3 Opus, GPT-4, and Gemini 1.5Pro all have an accuracy rate of over 92%.

Gary Marcus, former head of Uber AI, noted: “I think everyone would say GPT-4 is a step ahead of GPT-3.5, but there hasn’t been any qualitative leap in more than a year since then.”

Compared with the limited performance improvement, the training cost of LLM is rising. Training these models requires massive data and hundreds of millions or even trillions of parameters, resulting in extremely high resource consumption. The computing power and energy consumption required to train and run LLM are staggering, making it difficult for small organizations or individuals to participate in core LLM development.

The International Energy Agency estimates that by 2026, electricity consumption related to data centers, cryptocurrencies and artificial intelligence will be roughly equivalent to the electricity consumption of Japan.

OpenAI CEO Altman once said at an MIT event that training GPT-4 costs at least $100 million, while Anthropic CEO Dario Amodei predicted that the cost of training models in the future could reach $100 billion.

In addition, the complexity of the tools and techniques required to use LLM also increases the learning curve for developers. From training to deployment, the entire process is time-consuming and slows down development. A study by the University of Cambridge shows that it may take companies 90 days or more to deploy a machine learning model.

Another significant problem with LLMs is that they are susceptible to “hallucinations” — outputs generated by the model that appear plausible but are not actually correct. This is because LLMs are trained to predict the next most likely word based on patterns in the data, rather than to truly understand the information. As a result, LLMs may confidently generate false statements, make up facts, or combine unrelated concepts in nonsensical ways. Detecting and reducing these “hallucinations” is an ongoing challenge in developing reliable and trustworthy language models.

Small models reduce costs

Concerns about LLM’s huge energy requirements and the market opportunity to provide enterprises with more diverse AI options have led technology companies to gradually turn their attention to SLM.

"Daily Economic News" reporters noticed that whether it is AI start-ups such as Arcee, Sakana AI and Hugging Face, or technology giants, they are attracting investors and customers through SLM and more economical methods.

Previously, Google, Meta, OpenAI, and Anthropic have all released small models that are more compact and flexible than the flagship LLM. This not only reduces the cost of development and deployment, but also provides a cheaper solution for commercial customers. Given that investors are increasingly concerned about the high costs and uncertain returns of AI companies, more technology companies may choose this path. Even Microsoft and Nvidia have now launched their own small models (SLM).

SLMs are streamlined versions of LLMs, with fewer parameters and simpler designs. They require less data and training time—just minutes or hours. This makes SLMs more efficient and easier to deploy on small devices. For example, they can be embedded in mobile phones without occupying supercomputing resources, thereby reducing costs and significantly improving response speed.

Another major advantage of SLMs is their specialization for specific applications. SLMs focus on specific tasks or domains, which makes them more efficient in real-world applications. For example, SLMs often outperform general-purpose models in sentiment analysis, named entity recognition, or domain-specific question answering. This customization enables companies to create models that are efficient for their specific needs.

SLMs are also less prone to “hallucinations” within a specific domain because they are typically trained on narrower, more targeted datasets, which helps the model learn the patterns and information most relevant to its task. The focus of SLMs reduces the likelihood of generating irrelevant, unexpected, or inconsistent outputs.

Despite its smaller size, SLMs perform just as well as larger models in some aspects. Microsoft's latest Phi-3.5-mini-instruct has only 3.8 billion parameters, but outperforms models with far more parameters, such as Llama3.18B and Mistral7B. Aaron Mueller, a language model research expert at Northeastern University (a top private research university in Boston, Massachusetts, USA), pointed out that expanding the number of parameters is not the only way to improve model performance, and training with higher-quality data can also produce similar results.

OpenAI CEO Altman said at an event in April that he believes we are at the end of the era of giant models and “we will improve their performance in other ways.”

However, it is important to note that while the specialization of SLMs is a major advantage, they also have limitations. These models may not perform well outside of their specific training domains, lack a broad knowledge base, and cannot generate relevant content on a wide range of topics compared to LLMs. This limitation requires that users may need to deploy multiple SLMs to cover different areas of need, thus complicating the AI ​​infrastructure.

As the field of AI rapidly develops, the standard for small models may continue to change. David Ha, co-founder and CEO of Sakana, a small model startup in Tokyo, said that AI models that seemed huge a few years ago now seem "moderate." "Size is always relative," said David Ha.

Daily Economic News

Report/Feedback