zhang yiming is late, but even later

zhang yiming came late, but even later

2024-09-25

in september, the video model became a new ai competition point for large companies. and this time, zhang yiming was late again.

it has been five days since ali tongyi qianwen wensheng video went online, and three months since kuaishou released keling in june. on september 24, bytedance finally launched its own doubao video generation model.

what is worth noting is that zhang yiming, who has always attached great importance to return on investment (roi) and is pragmatic, set the tone of "commercialization" for the doubao video model from the very beginning.

at the event, volcano engine president tan dai said,doubao video generation model has been considering commercialization since its launch.areas of use include e-commerce marketing, animation education, urban cultural tourism, and micro-scripts, such as music mvs, micro-films, and short plays.

caption: input "a little girl wearing a santa hat holding a ragdoll cat" to generate the effect. image source: alphabet list

at the same time, tan dai also emphasized that before the release of the doubao video model, it had already appeared in many short drama projects on douyin. last month, kunlun wanwei released the ai short drama generation platform skyreels, and in july, meitu xiuxiu released the ai short film generation tool moki.

"there are already hundreds of short drama companies that have become ai big model users." a leading ai tool service provider also said that for big model manufacturers such as bytedance, ai can reduce the high cost of film and television production. with the support of ai, short dramas and mvs will become content products with more user participation, similar to online articles and short videos. in his opinion, "bytedance, which came late, is playing a commercial chess game."

in fact, when sora became a hit, whether or not it could launch a large video model became the "new standard" for measuring whether the technology of large model manufacturers is advanced in 2024.

in this race to catch up with sora, bytedance, which was "not in a hurry or slow", dragged its feet until the end of september when doubao big model pro was upgraded, and "reserved a space" for the video big model.

when zimubang opened jimeng ai and noticed that after applying the doubao video model, c-end users could experience video generation in jimeng ai.

the longest generation time is 12 seconds, which is "average". compared with the generation effect of qikeling, "it is not amazing, but it is a few months late and has not been left behind by the advanced video model." as one of the first ai practitioners to test the doubao video model, zhang yang told zimobang that although domestic video models are updated in large numbers,the reason why bytedance is late to the party may be that the effects of previous ai video generation have not been able to "amaze" users.

while domestic models are catching up with sora, openai has demonstrated a new path for reinforcement learning for base large models through the launch of gpt-o1. openai may be about to usher in a new era with a valuation of over one trillion yuan, and large model manufacturers will also face a new match point.

the jimeng ai previously launched by jianying only supports video lengths of 3 seconds; after loading the doubao large model, jimeng ai can generate videos of 3-12 seconds.

in comparison, the 1.0 version of keling can only experience 5 seconds of video generation without a membership, while bytedance's jimeng ai supports users' free trial by issuing 66 points for daily login.

however, unlike the doubao large model, which started the "large model zero yuan purchase" campaign with a price 98% lower than the industry average and sparked heated discussions, the doubao large model does not seem to conform to bytedance's tradition of "doing great things quietly" and appears a bit rough.

enter the keyword "a little girl holding a ragdoll cat". in the beta version before the release of the doubao video model, for the first time, ai seemed to misunderstand the ragdoll cat as a doll. the generated video was of a little girl holding a fake cat, and the face in the video was also a little stiff.

after the next generation on september 25, the ragdoll cat turned into a rural cat again. the third generation was the first time that the ai model completed the instructions accurately. zhang yang told zizibang that as one of the first ai practitioners in the internal test, the effect of using the doubao video model was not amazing.

however, the doubao video model can switch between different styles such as 3d animation, 2d animation, chinese painting, black and white, thick painting, etc. you can also choose random camera movements, or customize camera movements such as zooming in and zooming out. compared with keling, which only provides three aspect ratios of 16:9, 9:16, and 1:1, doubao is obviously more adaptable to different aspect ratios, including 3:4, 2:3, 4:3, 3:2 and more ratio options.

in zhang yang's opinion, doubao does provide more choices in terms of user interaction experience. however, although the doubao video model can achieve multi-lens switching within a prompt, "the overall picture connection is still a bit unsmooth, and the characters' expressions are a bit distorted."

however, it is no surprise that zhang yiming has engraved "pragmatism" into the genes of the doubao video model this time.

once the doubao video model was released, it was launched for the enterprise market. at the same time, tan daigeng, president of volcano engine, said,doubao video generation model has been considering commercialization since its launch.areas of use include e-commerce marketing, animation education, urban cultural tourism, and micro-scripts, such as music mvs, micro-films, and short plays.

unlike other ai startups that "look for nails with a hammer", both bytedance and kuaishou "have content and platforms, and with nails in hand, making large video models naturally has more application scenarios," said zhang yang.

on july 24, the official wechat account of keling ai revealed that the number of users who have applied for permissions has exceeded 1 million, and on the same day, a paid membership system was launched, including three membership categories: gold, platinum, and diamond, with annual membership prices ranging from more than 500 yuan to more than 5,000 yuan. for bytedance, which came late, it may be on par with keling in terms of technology, but in terms of commercialization, keling, which has already started c-end payment, seems to be one step faster again.

in may, when asked about openai releasing gpt-4o the day before google i/o, sundar pichai, ceo of google's parent company alphabet, said, "when we are at the inflection point of ai, i see opportunities, so if the timeline is extended, then something that happens on a certain day will be irrelevant."

just like google, which is often overtaken by openai,bytedance, which came late and has a nail in its hand, seems to be aiming to catch up from behind.

according to questmobile data, as of july, the number of monthly active users of ai apps has exceeded 66.3 million. among them, doubao, wenxiaoyan, kimi, xingye, and tongyi ranked in the top 5, with monthly active users of 30.42 million, 10.08 million, 6.25 million, 4.66 million, and 4.24 million respectively.

although doubao app was released significantly later than alibaba's tongyi qianwen, and even later than baidu's wenxin yiyan and kimi, the monthly active user base of doubao is already greater than the total number of active users of the other four apps.

therefore, in the field of ai video generation,faced with the current situation of slow technological breakthroughs in china, bytedance seems to have the confidence to come late.

whether it was keling, which came out first, or bytedance's doubao video model, which came late, it seemed that none of the manufacturers who launched video models in july and september could catch up with sora.

from kuaishou's "the mirror of mountains and seas: cutting through the waves" to bytedance's "sanxingdui: revelation of the future", using ai to make short dramas has become the "philosopher's stone" for leading manufacturers' ai video generation effects.

obviously, compared with traditional short plays that require real people to appear and interact, short plays such as mythology and science fiction are more suitable for the current ai large model.

"the current ai generation level is unstable. the effects of large-scale bomb explosions, fireworks, etc. are difficult to distinguish between real and fake, but they still require debugging personnel to prepare the pictures and then make adjustments for 1-2 hours," zhang yang told alphabet list. the videos generated by the current large ai models still have problems with unnatural expressions, small movements, and mechanical expressions.

zhu jiang of ai short drama platform reel.ai also said in an interview, "non-animation short dramas are expected to reach a consumer level in the second half of this year.

li yanhong once said, "it doesn't matter whether you are 12 months ahead or 18 months behind. every company is in a perfectly competitive market. no matter what you do, there are many competitors."

with tiktok, which has hundreds of millions of users, bytedance's calmness is not hard to explain. even tencent, which has not yet released a large video model, also has wechat, the largest social app. for zhang yiming and ma huateng, who "hold the nail in their hands", they seem to have more choices.

"which video model is used now, all are drawing cards."

"about 1 time out of 10 times of generation can really reach commercial standards, but the process of debugging 10 times may not be as efficient as manual work." after trying several large video models on the market, film and television practitioner shan shan bluntly stated that the current large models have not met user expectations in terms of generation effects.

"the input generates a video of a ragdoll cat, but the result is either understood as a toy cat or a rural cat. when users cannot get stable and unexpected results after 2-3 trials, it is difficult to truly achieve user retention."in shan shan's eyes, this may also explain why sora has not yet been publicly tested even though it has been released for more than half a year.

at the beginning of the year, there were reports that openai ceo altman would invest $7 trillion to cooperate with tsmc to build a wafer factory, intending to skip nvidia's self-developed chips. in september, openai was revealed that tsmc was developing a customized a16 angstrom-level process chip for its "sore video model" in order to enhance its video generation capabilities.

the density of this a16 chip has been increased by 1.10 times, and the speed has been increased by 8%-10% at the same operating voltage; at the same speed, the power consumption has been reduced by 15%-20%.using "lower prices and energy consumption to promote faster ai video generation" is obviously an important reason why openai postponed the public beta of sora.

in order to achieve better ai video generation effects, greater computing power costs, lower prices and energy consumption are required, which has become a key factor in whether domestic large video models can eventually "come out".

recently, bytedance was revealed to be planning to cooperate with tsmc on ai chips. although bytedance later responded that the report was false and stated that its exploration in the chip field is more focused on business optimization of recommendations and advertising.however, if you enter keywords such as "chip" on bytedance's recruitment website, there are already more than 200 related positions, including ai chip architecture and chip sil test engineer.

but for zhang yiming and even the leading domestic large-model manufacturers, the challenges facing them may be even more difficult.

on september 19, at the 2024 yunqi conference, yang zhilin, founder of dark side of the moon, said that the main significance of the launch of gpt-o1 is to raise the upper limit of ai. "increasing productivity by 10%, or 10 times gdp, the most important question is whether we can further scale through reinforcement learning."

in the gpt-o1 era, when the instant messaging apps of doubao, tongyi qianwen, wenxin, and kimi have evolved from generating answers within 10 or 20 seconds of thinking to being able to call on various tools to perform minute- or even day-level tasks, the ai instant messaging product form that is already familiar to domestic users will undergo tremendous changes. "ai is more like a human, or an assistant." this seems to have become the new schedule for the dark side of the moon to catch up with openai.

when the new competitive moment came again, the base large models of domestic large model manufacturers did not see any "new splash" at that time, but for zhang yiming and others, they were once again faced with a choice.

should we continue to invest a lot of "people, money, and computing power" in functional scenarios such as wensheng video for iteration, or learn from openai and introduce the enhanced iteration route? for bytedance, which is not short of money, of course it can "have both".

when the imagination space brought by "reinforcement learning" is large enough and attractive enough, and the new starting gun is fired, can bytedance, which failed to get up early, be able to take the lead this time?

(zhang yang and shan shan are pseudonyms in this article)

news

zhang yiming came late, but even later

introduction

my contact information