news

the same image can appear in different scenes! vidu, the video model, welcomes a major update

2024-09-15

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

on september 11, vidu, a self-developed original video model jointly developed by shengshu technology and tsinghua university, ushered in a major update of the "subject consistency" function, which can achieve consistent generation of any subject, making video generation more stable and controllable. currently, this function is open to users for free.
the subject reference function is launched to solve the role consistency problem
at present, a widely criticized problem for both the image generation function and the video generation function is the consistency of the subject of the image in the image or video. the same prompt word and the same large model are generated twice, and the generated content is not the same. in the artwork, this will lead to the inconsistency of the image of the protagonist, which has become one of the biggest differences between ai-generated works and human works.
to solve this problem, the industry has tried to adopt the method of "first ai generates pictures, then pictures generate videos", using ai drawing tools such as midjourney to generate storyboards, first keeping the main body consistent at the picture level, and then converting these pictures into video clips and editing and synthesizing them.
but the problem is that the consistency of ai drawings is not perfect, and it often needs to be solved through repeated modifications and partial redrawing. more importantly, the actual video production process involves many scenes and shots. when dealing with scenes with multiple components, this method requires a huge workload of raw images, which can account for more than half of the entire process. the final video content will also lack creativity and flexibility due to over-reliance on split-shot images.
at the media open day event held by shengshu technology on september 11, shengshu technology demonstrated the "subject reference" function, which allows users to upload a picture of any subject, and vidu will be able to lock the image of the subject, switch scenes arbitrarily through descriptive words, and output videos with the same subject.
this feature is not limited to a single object, but is aimed at "any subject", whether it is a person, animal, commodity, anime character, or fictional subject, ensuring its consistency and controllability in video generation, which is a major innovation in the field of video generation. vidu is also the world's first large video model that supports this capability.
for example, vidu can provide a "subject reference" for characters, whether they are real or fictional, to keep their images consistent in different environments and shots. for example, tang jiayu, chairman and ceo of shengshu technology, demonstrated that by providing the image of lin daiyu played by chen xiaoxu, the "lin daiyu drinking coffee" shot can be seen as the "same lin daiyu" in different scenes and different outfits.
tang jiayu, chairman and ceo of shengshu technology, demonstrated the "subject reference" function on site. photo by luo yidan, a reporter from beijing news shell finance
ai video creation coherence will be greatly improved. the era of ai complete narrative is coming.
based on this function, china central radio and television director and aigc artist shi yuxiang (sennhai fluorescence) created an animated short film "summer gift". in sharing the creative process, he said that compared with the basic image-generated video function, the "subject reference" function has broken away from the constraints of static images, and the generated images are more appealing and free, which greatly improves the coherence of the creation. at the same time, it helps him save about 70% of the workload of raw images, significantly improves efficiency, and allows him to focus more on polishing the story content rather than generating image materials. at the same time, consistency makes post-editing more convenient.
shi yuxiang, director of china central radio and television and an aigc artist, demonstrated an animation created using the "subject reference" function. it can be seen that the image of the protagonist remains stable in this animation. photo by luo yidan, a reporter from beijing news shell finance
tang jiayu said that the launch of the new function "subject reference" represents the beginning of ai complete narrative, and ai video creation will also move towards a more efficient and flexible stage. whether it is making short videos, animations or commercials, in the art of narrative, a complete narrative system is an organic combination of elements such as "consistent subject, consistent scene, and consistent style".
therefore, for the video model to achieve narrative integrity, it must be fully controllable on these core elements. the "subject reference" function is an important step for vidu in terms of consistency, but it is just the beginning. in the future, vidu will continue to explore how to accurately control complex elements such as multi-subject interaction, style unification, and stable switching of changing scenes to meet higher-level narrative needs.
he said that from a longer-term perspective, once full controllability is achieved, the video creation industry will undergo a disruptive change. by then, characters, scenes, styles, and even elements such as lens usage and light and shadow effects will be transformed into flexibly adjustable parameters. users only need to move their fingers and adjust parameters to complete the creation of a video work, and behind each work will also be the user's unique worldview and self-expression built based on ai.
beijing news shell financial reporter luo yidan editor wang jinyu proofreader yang li
report/feedback