news

Star AI unicorn Mistral AI unveils new big model, with outstanding coding and mathematical capabilities

2024-07-17

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Zhidongxi (Official Account: zhidxcom)

Compiled by Luo Tianjin

Editor | Yunpeng

According to a report by VentureBeat on July 17, French AI startup Mistral AI recently launched two new AI models, one is the code generation model Codestral Mamba 7B for programmers and developers, and the other is Mathstral 7B, an AI model designed for math-related reasoning and scientific discovery.

Codestral Mamba 7B has faster reasoning speed and longer context, providing fast response time even with long input text. At the same time, the model can handle inputs of up to 256,000 tokens, which is twice that of GPT-4o.

Mathstral 7B has a 32K context window and will use the Apache 2.0 open source license. It can achieve better results than other mathematical reasoning models on benchmarks with more inference time computation, and the model also has fine-tuning capabilities.

1. Code generation models can handle longer contexts

Mistral AI, the well-funded French AI startup known for its powerful open-source AI models, today launched two new entries in its growing family of Large Language Models (LLMs): a math-based model and a code-generation model for programmers and developers based on a new architecture, Mamba, that other researchers developed late last year.

Mamba attempts to improve the efficiency of the transformer architecture used by most leading LLMs by simplifying its attention mechanism. Models based on Mamba differ from more common Transformer-based models in that they may have faster inference speeds and larger context windows. Other companies and developers, including AI21, have released new AI models based on it.

Now, with this new architecture, Mistral AI aptly named itCodestral Mamba 7B, providing fast response times even with long input texts. Codestral Mamba is suitable for code productivity use cases, especially for more local coding projects.

Mistral AI tested the model, which will be available for free on Mistral AI’s la Plateforme API, on inputs of up to 256,000 tokens, twice as many as OpenAI’s GPT-4o.

Mistral AI shows that Codestral Mamba performs better than competing open source models CodeLlama 7B, CodeGemma-1.17B, and DeepSeek on benchmarks such as HumanEval.

Developers can modify and deploy Codestral Mamba from its GitHub repository and HuggingFace. It will be available through the open source Apache 2.0 license.

Mistral AI claims that early versions of Codestral outperform other code generators such as CodeLlama 70B and DeepSeek Coder 33B.

Code generation and coding assistants have become widely used applications for AI models, with platforms such as GitHub’s Copilot, Amazon’s CodeWhisperer, and Codenium, powered by OpenAI, growing in popularity.

2. The mathematical reasoning model has superior capabilities and also has fine-tuning capabilities

The second model launched by Mistral AI isMathstral 7B, an AI model designed for math-related reasoning and scientific discovery. Mistral AI developed Mathstral through Project Numina.

Mathstral has a 32K context window and will use the Apache 2.0 open source license. Mistral AI claims that the model outperforms all models designed for mathematical reasoning. It can achieve "significantly better results" on benchmarks with more inference time calculations. Users can use it as is or fine-tune the model.

“Mathstral is another example of achieving outstanding performance when building models for a specific purpose — a development philosophy we actively promote at la Plateforme, especially with its new fine-tuning capabilities,” Mistral AI said in a blog post.

Mathstral can be accessed through Mistral AI's la Plataforme and HuggingFace.

Mistral AI prefers to make its models available on open source systems, and the company has been competing with other AI developers such as OpenAI and Anthropic.

The company recently raised $640 million in a Series B round at a valuation of nearly $6 billion, and has received investments from tech giants such as Microsoft and IBM.

Conclusion: The battle of large model performance reaches new heights

From an industry perspective, Mistral AI’s new models highlight the trend towards professionalization of AI tools. By providing powerful and accessible models such as Mistral 7B and Codestral Mamba 7B, Mistral AI is becoming an important player in the AI ​​field, promoting the development of innovative and practical applications.

These models also highlight the importance of open source AI, encouraging collaboration and greater transparency within the technical community, and further advancing the rapid iteration and development of the field of AI big models by making powerful AI tools available to a wider audience.

Source: VentureBeat