2024-09-06
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
it home reported on september 6 that the official wechat account of mianbi intelligence published a blog post yesterday (september 5), announcing the launch of the open source minicpm3-4b ai model, claiming that "the end-side chatgpt moment has arrived."
minicpm3-4b is the third generation product in the minicpm series. its overall performance exceeds that of phi-3.5-mini-instruct and gpt-3.5-turbo-0125, and is comparable to many ai models with 7 billion to 9 billion parameters.
compared with minicpm1.0 / minicpm2.0, minicpm3-4b has a more powerful and versatile skill set and can be used for a wider range of purposes. minicpm3-4b supports function calls and code interpreters.
below are the differences between the three versions of the model structure (1->2->3):
vocabulary size: 123k->73k->73k
model layers: 40->52->62
hidden layer nodes: 2304->1536->2560
maximum length: 4k->4k->32k
system prompt words: not supported -> not supported -> supported
tool invocation and code interpreter: not supported -> not supported -> supported
minicpm3-4b has a 32k context window. with llmxmapreduce, minicpm3-4b does not need to occupy too much memory and can handle theoretically unlimited context.
mianbi intelligence also released the rag suite minicpm-embedding model and minicpm-reranker model, and a fine-tuned version of the minicpm3-rag-lora model for rag scenarios.