byte uses ai to revive its old business: entering into large-scale video generation models, close to real-life effects

2024-09-27

source of this article: times weekly author: he shanshan

the field of video generation large models welcomes important players.

on september 24, volcano engine, a subsidiary of bytedance, held an ai innovation tour in shenzhen. it released two large models of beanbag video generation-pixeldance and beanbag video generation-seaweed, and also opened an invitation test for the enterprise market.

for large models of video generation, the duration of the generated video is very important. currently, pixeldanc video generation duration is 5 seconds or 10 seconds, and seaweed is 5 seconds. tan dai, president of volcano engine, told time weekly and other media: "there are many difficulties in video generation that need to be overcome. the advantages of volcano engine include the ability to follow instructions, camera movement (subject consistency under multiple lenses), etc., behind which there are technological breakthroughs and full-stack capabilities. other advantages. in addition, douyin and jianying’s understanding of videos are also advantages.”

tan dai believes that large video generation models should not only discuss duration, but also consider application scenarios. different scenarios have different duration requirements, and huoshan is more concerned about solutions for different industries. "

it is worth noting that the new bean bag video generation model is being tested on a small scale by jimeng ai and will be gradually opened to all users in the future.

in february this year, zhang nan, the former ceo of douyin group, suddenly announced that he would switch to film editing and would promote the application of ai in film editing. just one week after announcing that it was responsible for editing, on february 16, openai launched sora, which can generate 1-minute videos, making vincent's video function popular around the world again. at the same time, zhang nan, as the person in charge of the editing business, announced the launch of ji meng on wechat moments. ji meng also became zhang nan's first important product update after his transfer.

at the ai innovation tour, chen xinran, head of ai marketing for jianying and jimeng, introduced the latest status of the “ai-ization” of the two apps. she said that in the past, producing content of similar quality required a team of 5-10 people, including creating story lines, polishing special effects, packaging and editing, etc. the collaboration process was complex, the production cycle lasted 1-2 months, and a large amount of money and resource investment was required. but with the help of ai, most creators can complete the creation alone, and the production time has been reduced to 1-2 weeks.

tan dai also mentioned in his speech: "there are many difficulties in video generation that need to be overcome. the two doubao models will continue to evolve, explore more possibilities in solving key problems, and accelerate the expansion of the creative space and application of ai videos."

in any case, the birth of the large doubao video generation model and its use in jimeng and jianying means that bytedance is one step closer to using ai to improve its "old business" in video.

source: photo taken by times weekly reporter on site

can switch lenses freely

according to reports from the volcano engine site, the use of large bean bag models is growing rapidly.

as of september, the daily average usage of doubao language model tokens has exceeded 1.3 trillion, a tenfold increase from the launch in may. the multi-modal data processing volume has also reached 50 million pictures and 850,000 hours of voice per day respectively.

with the huge number of users, the bean bag model has once again brought new changes. not only has a new video generation model been added, but also a bean bag music model and a simultaneous interpretation model have been released, which have fully covered all modes such as language, speech, images, and videos.

previously, most video generation models could only complete simple instructions. the doubao video generation model can achieve natural and coherent multi-shot actions and complex interactions with multiple subjects - it can not only follow complex instructions, but also allow different characters to complete the interaction of multiple action instructions. the appearance, clothing details and even headgear remain consistent under different camera movements, which is close to the effect of real shots.

the doubao video generation model is based on the dit architecture. through the efficient dit fusion computing unit, the video can be freely switched between large dynamics and moving lenses, and has multi-lens language capabilities such as zoom, surround, pan, zoom, and target following. "this means that the video generated by doubao overcomes the problem of consistency in multi-scene switching, and can maintain the consistency of the subject, style, and atmosphere at the same time when switching shots. this is also a unique technological innovation of the doubao video generation model." tan dai said .

regarding the future direction of the model, tan dai said that volcano engine pays more attention to better implementation and accelerated innovation based on existing models. "technology must meet user needs, and new and old technologies must be constantly adjusted and adapted. the mature standard for large models is the user after incubation real and good feedback that has been experienced and has a certain amount of volume, rather than feedback from the laboratory. for example, jimeng and doubao have a large number of internal tests, and user feedback is an important evaluation criterion.”

previously, doubao big model set a token price that was lower than 99% of the industry, and was the first volcanic engine to start a wave of price cuts. at present, the pricing for the use of doubao video’s large models has not yet been announced. tan daidai told times weekly and other media that the application scenarios of video models and language models are different, and the pricing logic is also different. "new experience-old experience-migration cost" must be considered. whether it can be widely used in the end depends on how much the productivity roi is improved compared with the previous one. .

source: jimeng official website

explore ai native products

previously, ordinary users of jimeng could generate 3-second ai short videos, while vip users could extend the time by 3 seconds.

starting in march this year, cutting has intensively updated ai functions, such as smart subtitles, video translation and other functions. it has also opened up with douyin to support traffic and cash rewards for short videos that use the cutting ai function. for example, outstanding works can receive traffic support of dou + 500 yuan per video. currently, the vip membership price of jianying is 218 yuan for one year, with an average monthly fee of 18.17 yuan, while the monthly vip membership fee of dream is 69 yuan.

at the ai innovation tour, chen xinran mentioned that “the technology related to bean bag large models has been applied to cut-outs, dream ai and waking pictures” and introduced new functions under the application of ai technology.

for example, in the application of digital clones, the digital human voice cloning function can be customized online based on the voice cloning technology. digital producers only need to record or upload a 3-minute high-definition frontal video, and tone cloning only requires 5 seconds of voice input to generate a natural, smooth, non-contradictory voice, and can also translate into various languages. "we are very concerned about privacy and security issues. we require personal confirmation from users in terms of product design and technology. we will also pay attention to new industry regulations to improve service security and reliability." chen xinran said.

in addition, there are also "content marketing" creation tools for e-commerce merchants. in the past, merchants might have spent several hours browsing douyin and tiktok to analyze the popular video routines, dismantling the routines and copying the copy, and also spent several hours editing. now it only takes a few minutes to fill in the product name and upload it. by adding materials or pasting product page links, you can generate multiple different styles of delivery videos with one click.

chen xinran specifically mentioned that in addition to applying ai to existing products, jianying is also exploring the possibility of ai native products in the gena (generative artificial intelligence) i era. “jimeng ai is an exploration in this direction. the product is currently being connected to two large video generation models for internal testing of scene polishing and effect polishing. we believe that creation should not be limited by production cost, style or cultural background. it should be fun, happy and free. of."

tan dai also said that the application cost of large models has been well solved. "large models need to move from volume price to volume performance, with better model capabilities and services."

news

byte uses ai to revive its old business: entering into large-scale video generation models, close to real-life effects

introduction

my contact information