news

minimax holds first partner day conference, releases video and music generation models

2024-09-01

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

01:55
in a live show combining technology and music, the first developer conference of ai unicorn company minimax shanghai xiyu technology co., ltd. (hereinafter referred to as minimax) - "minimaxlink partner day" kicked off on august 31. on the same day, minimax officially released the video model - video-01 and the music model music-01.
multimodal models have become a must-answer question for large-model companies, among which the video model is the most obvious. many ai companies have already released large-model videos, including the video generation model "qingying" launched by zhipu ai, pixverse v2 of aishi technology, vidu of shengshu technology, and "keling ai" of kuaishou.
01:55
laying out a multimodal model is just the beginning
it is understood that the video-01 released by minimax this time focuses on native high-resolution and high-frame rate video generation. entering prompt words can generate a five-second video. users can log in to the minimax official website to experience the product.
minimax officially released video model—video-01
after evaluating the video model video-01, a product designer said, "the overall effect is very good, with good physics, dynamic range and stability, and relatively accurate response to science fiction and fantasy concepts, but it feels very plastic. the aesthetics are relatively poor, and the picture quality and picture details are a bit poor."
in this regard, yan junjie, founder and ceo of minimax, said that what is currently on display to the public is only the initial version of the product, and updated versions will be gradually released in the future.
for this reason, the video model will be provided to users for free for a period of time until the product is updated to a satisfactory state before commercialization is considered. "future commercialization will mainly be divided into two forms. one is based on the company's open platform and the company's accumulated more than 2,000 customer partners. many well-known corporate users are also willing to use voice recognition capabilities. the other is to introduce advertising mechanisms in its own products."
according to reports, minimax's current multimodal model matrix products also include music-01 multifunctional end-to-end music generation model, speech-01 new generation generative speech synthesis model, etc. "this is just the beginning. we will continue to improve the model speed and effect, and will further release corresponding products." yan junjie said.
the key to improving model performance
"as a technology company, technology is always the most core element." yan junjie said that minimax's focus at this stage is not commercialization.
yan junjie said that minimax's model currently handles more than 3 billion customer interactions. a year ago, minimax's interaction time was only 3% of chatgpt's; now this ratio has increased to 53%; but even so, the number of connected users has not reached 1% of the global population, but only 0.8%. to grow from 1% to 100%, the most important thing is to increase the penetration rate and depth of use of ai products among users.
minimax user interaction data
there are many technical difficulties that need to be overcome, and the three most important optimization directions are: how to continuously reduce the error rate of the model, infinite input and output, and multimodality. "it is not difficult to find in life that text interaction is only a small part, and more is voice and video interaction. multimodal content, such as sound, pictures, and videos, has become the mainstream of information transmission. in order to increase penetration, multimodality is the only way." yan junjie said that to overcome these difficulties, "fast" is the core technology research and development goal of minimax's underlying large model. "in two models with similar performance, the one with faster training and reasoning can more effectively use computing resources to iterate more data, thereby having a better model capability."
according to reports, minimax has undergone two key underlying technology changes in the past, including moe (hybrid expert architecture) and linear attention. in april this year, the company developed a new generation of models based on moe + linear attention, which is considered to be comparable to gpt-4o. when processing 100,000 tokens, the processing efficiency of the new model can be increased by 2-3 times, and the longer the length, the more obvious the improvement in model efficiency.
it is understood that the abab7 series text model using new generation technology will be officially released in the next few weeks.
public reports show that minimax, founded in december 2021, has previously completed three rounds of financing. investors include tencent, mihoyo, etc., and its current valuation has exceeded us$2.5 billion.
the paper reporter yu yan and intern wang chun
(this article is from the paper. for more original information, please download the "the paper" app)
report/feedback