2024 ciftis｜shengshu technology, solving the problem of inconsistent generation of large video models

2024-09-15

"when we input an instruction to ai and ask it to generate a video, in fact, the core demand is that we hope ai can help us complete a complete narrative. to achieve this goal, we need to keep the core elements unified and controllable." at the 2024 china international fair for trade in services (hereinafter referred to as "ftis") held recently, tang jiayu, chairman and ceo of shengshu technology, gave a solution: the subject reference function of the video large model vidu, which can achieve consistent generation of any subject. in order to achieve this, the industry has tried methods such as "ai generates pictures first, and then pictures generate videos", but the subject reference function not only reduces the workload, but also breaks the limitations of the split-lens screen on the video content. technological breakthroughs have given the commercialization of large video models greater room for imagination.

when large language models became popular, shengshu technology targeted the multimodal track and launched the capabilities of wensheng video in january 2024. according to shengshu technology's plan, video capabilities require longer duration and higher consistency to develop, but sora's appearance brought the startup's plan forward.

at the end of april, vidu was released, which supports the one-click generation of 16-second high-definition videos. in june, it supports the one-click generation of 32-second videos, generates sound effects at the same time, and reconstructs 4d videos from a single generated video. at the end of july, vidu was officially launched globally, opening up image-generated videos, character consistency functions, and the ability to generate videos up to 8 seconds long.

this time, tang jiayu focused on introducing vidu's latest feature "subject reference" at the 2024 services trade fair. the so-called subject reference allows users to upload a picture of any subject, and vidu can lock the image of the subject, switch scenes arbitrarily through descriptive words, and output a video with the same subject. "any" is the keyword, that is, whether it is a person, animal, commodity, or anime character or fictional subject, it can ensure its consistency and controllability in video generation.

beijing business daily reporters learned that before this function was launched, the video big model was not without solutions to achieve this goal. capabilities such as "image-to-video" and "character consistency" could also be achieved.

taking the method of first using ai to generate images and then using images to generate videos as an example, you can use ai drawing tools such as midjourney to generate storyboards, first keep the main body consistent at the image level, and then convert these images into video clips and edit and synthesize them.

but the problem is that the consistency of ai drawings is not perfect, and it often needs to be solved through repeated modifications and partial redrawing. more importantly, the actual video production process involves many scenes and shots. when dealing with scenes with multiple components, this method requires a huge workload of raw images, which can account for more than half of the entire process. the final video content will also lack creativity and flexibility due to over-reliance on split-shot images.

vidu's "subject reference" function directly generates video materials by "uploading the subject image + inputting the scene description words". this method greatly reduces the workload and breaks the restrictions of the split-lens screen on the video content, allowing creators to create rich, flexible and changeable video content based on text descriptions.

shi yuxiang, director of china central radio and television and aigc artist, shared the creative process of the animated short film "summer gift". he said that compared with the basic image-to-video function, the "subject reference" function breaks away from the constraints of static images, and the generated images are more appealing and free, which greatly improves the coherence of the creation. at the same time, it helps him save about 70% of the workload of raw images.

beijing business daily reporter wei wei

report/feedback

news

2024 ciftis｜shengshu technology, solving the problem of inconsistent generation of large video models

introduction

my contact information