news

yan junjie, founder of minimax: the only thing you can do is to make yourself better | ai frontline

2024-09-20

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

editor’s note:

today, ai daily, a subsidiary of cailianshe and sci-tech innovation board daily, officially launched the "insight・ai frontline" interview. the interview focuses on outstanding companies, entrepreneurs, leading scholars, investors, etc. in the field of artificial intelligence and large models at home and abroad, bringing the latest exploration, practice and thinking of ai empowering thousands of industries. insight・ai frontline, insight, pioneer, front line!

"science and technology innovation board daily" september 20th (reporter huang xinyi)as one of the six ai tigers, minimax has completed its a+ round of financing, with the latest round led by alibaba at $600 million. the company's valuation has exceeded $2.5 billion, with investors including tencent, sequoia china, hillhouse capital, idg, mihoyo, etc.

within minimax, colleagues only call each other by their nicknames. as for the founder yan junjie, employees call him io (nickname). yan junjie has served as sensetime's vice president, deputy director of the research institute, and cto of the smart city business group. in december 2021, on the eve of sensetime's listing, yan junjie left sensetime and founded minimax.

recently, minimax released the first ai high-definition video generation model abab-video-1. yan junjie revealed in an interview with the science and technology innovation board daily and other media after the press conference that the abab7 series model using the new generation of technology will be officially released in the next few weeks, and the effect is comparable to the gpt-4o model. when talking about the difficulty of commercializing large models, yan junjie responded that this is indeed an industry test, and only companies that pass this test can succeed.

"when no one uses a product, or when a product does not make money, you certainly cannot blame the users. most of the blame can only be placed on your own poor technology or poor product quality. this can be seen as a test for an industry. if you can pass the test, you will be able to succeed. if you cannot, (the company) should indeed be shut down."

yan junjie at the press conference

▌competition is inevitable, so you should maximize your strengths

with the rise of a new wave of artificial intelligence represented by gpt, more imagination space has been created for the realization of general artificial intelligence (agi).

yan junjie believes thatagi is not something high-end, but it should become a part of everyone's life.

“for example, when people watch tiktok or wechat, they don’t realize that these are content distribution platforms for short videos based on recommendation algorithms. people think that tiktok and wechat have become a part of their lives.the same is true for agi. when ai becomes like a mobile phone, and becomes a part of everyone's daily use, agi will be realized. of course, this will take a long time, but i think we can work hard step by step.

regarding the changes that ai can bring in the short term of five years, yan junjie believes that in five years, with the help of ai, everyone will greatly improve their iq limit and achieve a higher iq than the person with the highest iq in a conference room with hundreds of people. "of course, it is not certain how much the upper limit of human iq can be improved with the help of ai."

entering 2024, the competition for big models is becoming increasingly fierce. faced with the rapid attack from large companies, the survival space of start-ups is compressed.

"competition is inevitable." yan junjie sighed, "in some of china's well-developed industries, such as electric vehicles, mobile phones, and mobile internet, several companies have engaged in long-term and very fierce competition, which ultimately made chinese products lead the world.since other emerging industries have developed in this way, and large models are likely to generate great social value, there should indeed be a lot of competition. this is the objective law of development.。”

yan junjie believes that if a startup company cannot win in the fierce competition, it should be eliminated.

when companies that are many times bigger than you start competing with you, you will realize that some things are useless because those things are hundreds or thousands of times better than yours. what we can do is to infinitely magnify (make stronger) things that have the potential to become stronger.it boils down to two points: one is how to improve technology, and the other is how to better co-create with users. both of these require some very critical judgments and require very long-term accumulation. "

▌multimodal large models mean that the underlying infrastructure also needs to be upgraded

in the past few months, the competition for video generation models has been very lively. the video model vidu created by shengshu technology was launched; zhipu ai officially released the video generation model "qingying"; sensetime released the first controllable character video generation model vimi for c-end users; alibaba damo academy launched a one-stop ai video creation platform "xunguang"; kuaishou keling ai officially launched the web version and open-sourced the controllable portrait video generation framework called liveportrait...

recently, minimax also released its first ai high-definition video generation model. yan junjie believes that multimodal large models are the only way forward, because multimodal content is a major part of human communication.

"most of the content we read every day now is not text, but some dynamic content. when you open xiaohongshu, it's pictures and texts, when you open douyin, it's all videos, and even when you open pinduoduo to buy things, most of the time it's pictures.for human society, the core meaning of the big model is to do better information processing. most of the information is reflected in multimodal content, not in text. text is often the most essential part of it. in order to have a very high user coverage and a very high depth of use, the only way is to output (multimodal) dynamic content, rather than just outputting pure text-based content. this is a very core judgment.

although multimodality is generally favored, the industry has also felt during its exploration that the research and development of video generation models is obviously more difficult than text models.

in this regard, yan junjie believes that most of the time, the complexity of video work is indeed more difficult than text work, because the text of the video is naturally very long. for example, a video involves tens of millions of inputs and outputs, which is naturally difficult to process. secondly, the video volume is very large. for example, a 5-second video has several megabytes, but the text that can be watched in 5 seconds may be less than 1k, which is a storage gap of several thousand times.the challenge here is how to process data, clean data, and label data based on the underlying infrastructure previously built on text. this means that the infrastructure also needs to be upgraded.

"in addition, more patience is needed. there are many open source things for text, and it would be faster to do research and development based on open source, but there are not so many open source things for video, so you need to do it all over again, which requires more patience."

▌objectively speaking, the price war increased the number of model calls

since the beginning of this year, many large model companies have started price wars in exchange for the popularity of ai large models. yan junjie believes that objectively speaking, the price war has indeed increased the number of model calls.

"when the domestic model price war started, most companies that originally thought that large models were expensive began to find that large models were cheap and could be used with confidence. in the end, it was surprisingly discovered thatafter the price war over big models, many traditional enterprises began to be very willing to use big models. they thought that the cost was low and it didn’t matter if there were mistakes. they could just call it once more. objectively speaking, this greatly increased the number of model calls.

in the face of fierce competition among domestic models, minimax is expanding into overseas markets. yan junjie said, "it is precisely because of the fierce competition among domestic models that we have to move forward. at least now we have reached a level comparable to gpt in non-english languages. since competition and various things are unavoidable, we should try our best to do our best.we see the optimistic side. the use of large models in china is indeed growing significantly, and chinese models are becoming more and more competitive overseas. i think these are two positive changes.”

in terms of the specific commercialization model, yan junjie introduced that the commercialization of the entire company can be divided into two forms, one is the minimax open platform for the b-end, and the other is the advertising mechanism within the product.

"minimax open platform now has more than 30,000 corporate customers and developers, including well-known internet companies, traditional enterprises, etc. users will use our voice and visual capabilities, because not all companies can do it themselves, we are a very good partner. secondly, minimax products also have advertising mechanisms, which can be commercialized. however, at this stage, the most important thing is not commercialization, but to truly make the technology widely available."

at present,there are six independent large-scale model startups in china, namely zhipu ai, baichuan intelligence, zero one everything, dark side of the moon, minimax, and step star, which are called the "six little tigers of ai"in an interview with the media, zhu xiaohu, managing partner of jinshajiang venture capital, said that large models are too expensive and cannot support themselves by commercialization. the best outcome for these startups is to sell to large companies.

talking about investor zhu xiaohu's remarks and the difficulties in commercializing large models, yan junjie responded that this is indeed a test for the industry, and only companies that pass this test can succeed.

"when no one uses a product, or when a product doesn't make money, you can't blame the users. most of the time you can only blame your own poor technology or poor product quality. that's how we look at it anyway."

in yan junjie's view, qq didn't know how to make money in 2000, and tried numerous commercialization solutions but all failed, but eventually found mobile value-added services and games. everyone goes through this process. "this can be seen as a test for an industry. if you can pass the test, you can succeed. if you can't, you should really shut down (the company)."

throughout the interview, yan junjie always seemed calm about industry competition and corporate prospects."we definitely can't blame the users or the ecosystem. we can only blame ourselves for not doing well enough. at least we have been working hard. we hope we can become better. this is the only thing we can do."

(science and technology innovation board daily reporter huang xinyi)
report/feedback