news

tang jiayu, ceo of shengshu technology: video generation is still in its early stages and there are technical bottlenecks to be overcome

2024-09-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

00:04
"when using ai to make narrative films, the ratio of waste films may be 50:1, that is, out of 50 pictures generated, only one may be suitable for this type of narrative creation." on september 11, at a media open day event held by beijing shengshu technology co., ltd. (hereinafter referred to as shengshu technology), a film and television creator made the above statement while sharing.
with the development of large model generation technology, more and more film and television creators have begun to try to use ai technology in their creations, but as of now, there are still many pain points.
"ai-generated videos are uncontrollable. once there are too many elements, it is impossible to understand multiple characters and spatial scenes." vicky, an ai film and television creator, said. many ai film and television creators from home and abroad said that in the actual creation process, the core problem that is common is insufficient controllability or consistency, especially when it comes to complex scenes and interactive scenes.
although ai video models perform well in following instructions, the output results are still uncertain, and it may take multiple attempts to generate a satisfactory picture. in addition, ai-generated models still have limitations in camera movement, lighting effects, and detail processing, making it difficult to achieve completely fine control.
shengshu technology officially launched the ai-generated video model on july 30 this year. in order to help creators improve efficiency, the company recently upgraded the functions of the video model vidu and released the "subject reference" function. this function was developed to address the consistency problem. it can achieve consistent generation of any subject, making video generation more stable and controllable.
the "subject reference" function allows users to upload a picture of any subject. vidu can lock the subject image, switch scenes arbitrarily through descriptive words, and output videos with the same subject.
on september 11, a reporter from the paper (www.thepaper.cn) logged into the vidu platform on shengshu technology's official website and tried to generate a video. he uploaded a 3d picture of american movie star leonardo dicaprio and entered keywords such as "blue sky", "wine glass", and "toast";
input words: "blue sky", "wine glass", "toast", etc.
the image generated by vidu is as follows:
00:04
upload a 2d screenshot of the heroine of the japanese anime "youth in motion" and enter keywords such as "running", "late", and "morning".
the image generated by vidu is as follows:
00:04
tang jiayu, co-founder and ceo of shengshu technology, said in an interview that vidu's "subject reference" function is the world's first technology with consistent generation capabilities. shengshu technology's core mission is to build a multimodal large model. ai video generation is still in its early stages, and there are more technical bottlenecks to be broken through in the future. he believes that ai video technology will not always be a tool for a niche group of people. it is estimated that by the end of this year, ai video technology will be popularized to the public and users can use it easily.
shengshu technology was founded in march 2023. its core team members are from the institute of artificial intelligence of tsinghua university. its chief scientist zhu jun is a professor at tsinghua university. its co-founder and ceo tang jiayu is a master of the natural language processing laboratory of tsinghua university. he has served as vice president of ruilai wisdom and senior product manager of tencent youtu lab.
the paper reporter yu yan and intern wang chun
(this article is from the paper. for more original information, please download the "the paper" app)
report/feedback