news

Moore Threads open source audio understanding model MooER: based on domestic full-featured GPU training and reasoning

2024-08-24

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

IT Home reported on August 23 that Moore Threads open-sourced a large audio understanding model - MooER (More Ear), which is the industry's first large-scale open source speech model based on domestic full-featured GPUs for training and reasoning.

Based on the Moore's Thread Kua'e (KUAE) intelligent computing platform, the MooER large model completed the training of 5,000 hours of audio data and pseudo-labels in 38 hours.

MooER not only supports Chinese and English speech recognition, but also has the ability to translate Chinese into English. In the Covost2 Chinese-English test set, MooER-5K achieved a BLEU score of 25.2, which is close to industrial-level results.

In this work, the Moore's Thread AI team open-sourced the inference code and models trained with 5,000 hours of data, and plans to further open-source the training code and models trained with 80,000 hours of data.

The model structure of MooER consists of three parts: Encoder, Adapter and Decoder (Large Language Model, LLM). The specific model parameter scale is as follows: