news

doubao is on the march, bytedance's version of sora is "late but here" tan dai, president of volcano engine: we started thinking about commercialization as soon as we landed

2024-09-26

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

"the development path of bytedance's big model is to polish the product to c first, and then expand the market to b after the model capabilities have competitive advantages." on september 25, tan dai, president of volcano engine, said in a group interview with the media including a reporter from the "daily economic news".

based on this development path, after the early version was applied to the ai ​​creation platform "jimeng" developed by the jianying team in may this year, the doubao video generation model was officially unveiled at the 2024 volcano engine ai innovation tour on september 24, and was opened to the corporate market for invitation testing.

since sora opened the "chatgpt era of video", domestic big model players such as kuaishou, zhipu ai, minimax and alibaba have successively launched similar products. now that bytedance has entered the market, can it change the existing competitive landscape of video big models?

through the demonstration of the effects of the two video generation models at the press conference, tan dai believes that whether it is the semantic understanding ability, the complex interactive pictures of multiple subjects moving, or the content consistency of multi-lens switching, the doubao video generation model has reached the industry-leading level.

at the same time, the "music generation model" and "simultaneous interpretation model" were released at the meeting, further expanding bytedance's ai territory.

as the first player in the industry to reduce the price of large models, volcano engine has significantly accelerated its commercialization. tan dai told the reporter of the "daily economic news" that the number of to b calls has increased rapidly. "i am not so clear about the to c business, but it feels very smooth. i think ai can solve problems end-to-end, and the boundaries between to b and to c businesses are not so obvious."

a reporter from the daily economic news learned from the press conference that as of september this year, the average daily usage of tokens of the doubao large model has exceeded 1.3 trillion, with an average of 50 million images generated per day and 850,000 hours of voice processed per day.

bytedance enters ai video market, "starting to consider commercialization as soon as it is launched"

on august 31, minimax, one of the "six little dragons of ai", released the video generation model video-1; on september 19, keling ai completed its 9th iteration and released the "keling 1.5 model". at the yunqi conference on the same day, alibaba cloud launched a new video generation model. in less than a month, the already fierce field of video generation models has welcomed a new player.

at the volcano engine ai innovation tour on september 24, two large models, pixeldance and seaweed, were released together.

the reporter of daily economic news noticed that the longest duration of video content generation supported by these two large models has not been announced yet. the dream app shows that it supports the generation of 3-second, 6-second, 9-second and 12-second video content. in comparison, keling launched the video continuation function on june 21, which allows the video to continue for about 5 seconds and can generate a maximum of about 3 minutes of video.

image source of the press conference: photo taken by yang xinyi, a reporter from china business network

"different scenarios have different requirements for video length, and we pay more attention to solutions for different industries." tan dai said in an interview with a reporter from the "daily economic news" that the advantages of doubao's large video generation model in the industry mainly lie in its ability to follow instructions, consistency in multi-lens switching, and generalization capabilities of video generation.

at the press conference, multiple official video demonstrations demonstrated the above capabilities: for example, in a demonstration video of a man and a woman riding horses, in the 10-second video, the two people had different expressions and movements, but they all performed naturally and smoothly.

it is worth noting that the doubao video generation model supports the generation of content in various styles such as black and white, 3d animation, 2d animation, chinese painting, and thick painting.

"it is difficult for video generation models to produce different styles. in addition to technology, it mainly depends on the richness of the data source." a large model technician told the reporter of the "daily economic news". tan dai attributed this to "the advantages of full-stack capabilities, technological breakthroughs, and douyin and jianying's understanding of videos."

adhering to the principle of pragmatism, tan dai said that the new doubao video generation model "has been considering commercialization since its launch", and its application areas include e-commerce marketing, animation education, urban cultural tourism and micro-scripts.

keling is also "anxious" about commercialization. in the second quarter earnings conference call on the evening of august 20, cheng yixiao, co-founder, chairman and ceo of kuaishou, regarded the commercialization of keling as a top priority, "striving to achieve a considerable scale of commercialization and monetization as soon as possible."

talking about the pricing strategy, tan dai revealed that the price of doubao's video generation model has not yet been determined. "the application scenarios of video models and language models are different, and the pricing logic is also different. the value of the product should be measured by new experience, migration costs, etc. whether (the product) can be widely used in the end also depends on whether its productivity roi (return on investment) is much higher than before."

“price is no longer a threshold for innovation.” are cloud vendors entering a new battlefield in the ai ​​era?

in addition to the new video generation model, the event also released the doubao music model and simultaneous interpretation model. so far, the doubao omnimodal large model family has covered three categories: large language model, large visual model and large speech model, and a total of 13 large models have been released.

however, models alone are not enough. many industry insiders say that the current implementation of large model vendors is like "looking for nails with a hammer." so, how to find nails and how to use the right hammer to hammer nails more easily may become new challenges for cloud vendors in the ai ​​era.

the first is the cost issue between large model manufacturers and enterprises.

at the press conference in may, tan dai announced that the inference input price of doubao’s main model was only 0.0008 yuan/thousand tokens, which was 99.3% cheaper than the industry average, kicking off a price war in the large model field.

"cost is the key. if the price drops by one-tenth, the volume may increase tenfold." in tan dai's view, model call volume and application coverage are the current focus. "we focus mainly on application coverage rather than revenue. we believe that unlocking new scenarios is more valuable, such as upgrading scenarios in chat, companionship, productivity, and expanding enterprise application scenarios."

however, he also insisted that the premise of business facing the b-end market must be sustainable. "we cannot consider relying on advertising for profit like the to c business." "to b products must achieve positive gross profit. we have the ability and confidence to do so."

after the doubao model was first reduced in price, models such as ali tongyi qianwen and baidu wenxin yiyan also reduced their prices one after another. at this year's yunqi conference, the three tongyi qianwen main models on the alibaba cloud bailian platform were reduced in price again. alibaba cloud cto zhou jingren even said that "compared with the huge applications in the future, they are still too expensive."

regarding the current situation of the industry, tan dai said that after the price reduction, judging from the call volume, cost is no longer an obstacle to innovation. "what we need to do next is to improve the quality and performance of the model based on this price. quality means making the model more powerful and diverse."

after a round of "general price cuts", the large model industry will no longer just "compete" on prices. at this stage, the competition is about model performance, which is also supported by client demand.

according to tan dai, the demand for the implementation of large models in the b2b market changes slowly, and the core demand is to reduce costs and increase efficiency. "when enterprises applied ai, they used to plan from top to bottom, which had a high probability of failure. now they need to innovate from the bottom up."

a reporter from the "daily economic news" noticed that in the process of assisting enterprises in digital transformation, volcano engine has joined hands with various parties this year to establish the smart terminal big model alliance, the automobile big model ecological alliance, and the retail big model ecological alliance, and its external customers have covered more than 30 industries including mobile phones, automobiles, finance, consumption, and interactive entertainment.

now, bytedance has a few more "hammers" that suit it. how to find more "nails" that match them in various industries will be the next test for volcano engine.

daily economic news

report/feedback