2024-08-19
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Synced
Author: Wu Xin
On August 19, Kunlun Wanwei released SkyReels, the world's first AI short drama platform that integrates video and 3D models. The SkyReels platform integrates script generation, character customization, storyboards, plots, dialogues/BGM, and film synthesis, allowing creators to easily create high-quality AI videos with one click. This is a 2.5-minute short drama.
Video link: https://mp.weixin.qq.com/s/4w5eOquY6p2Z7pXIUuKf9w
"We should let go of our rigid and resistant mindsets and embrace this fragmented information age." In early December 2023, director Huang Jianxin lamented at the Beijing Film Academy's Beijing Film Academy Lecture Hall that compared to movies, the rise of vertical screen short dramas has truly formed a global output.
With single episodes lasting from a few minutes to more than ten minutes, strong entertainment value and fast-paced plot, they frequently hit the user's "satisfaction points". In just three years (by 2023), the Chinese short drama market will account for 70% of the annual box office of theater films.
At the same time, a large number of independent short drama apps were launched globally. The short drama software ReelShort from Chinese Online performed well on the US iOS rankings, even surpassing TikTok, which has dominated the list for a long time, and achieved a breakthrough.
Short dramas have become one of the fastest growing film and television fields in recent years, and have also become a testing ground for new technologies. "One-click translation" and "AI face-changing" are common, and many online writing companies have also released content generation models that can assist authors in writing. With the help of AI large model video generation capabilities, the production time of short dramas that used to take three months now only takes half a month.
Short video clips created by AI can get millions of views once released, but the production of a short drama with AI still faces many challenges. Creators need to repeatedly "jump" between multiple AI tools such as ChatGPT, Midjourney, Runway, Eleven Labs, ComfyUi, Adobe, and Jianying, and the results of the works are still unsatisfactory.
In this context, Kunlun Wanwei, a Chinese artificial intelligence technology company with more than ten years of overseas experience, launched SkyReels, the world's first AI short drama platform, an AI short drama platform that integrates video big models and 3D big models. This is not only the successful implementation of domestic big models in the field of short dramas, but also heralds the arrival of the era of "one-click drama" and "one person, one drama".
At the same time, this revolutionary tool is expected to bring about explosive growth in AI short drama user-generated content (UGC) and professional user-generated content (PUGC), and promote further rapid growth in the short drama content creation and consumption market.
1. Get to know SkyReels - the world's first AI short drama platform
SkyReels product operation demo
Video link: https://mp.weixin.qq.com/s/4w5eOquY6p2Z7pXIUuKf9w
When I opened the SkyReels website, I immediately felt that the design of "AI Short Drama" was very different from other "AI Creative" platforms.
Both are driven by AI, but the "AI Creative" platform focuses on image and video generation, while SkyReels integrates script generation, character customization, storyboard design, video shooting and synthesis, completely replicating the industrial process of short dramas.
SkyReels integrates story creativity, script generation, character customization, storyboard design, video shooting and synthesis, completely replicating the industrial process of short dramas.
If you want a short drama to have good ratings, first of all, you have to have a good idea and turn it into a fun script. This is the most important thing.
Although text creation is the comfort zone of the large language model (LLM), the difference of SkyReels is that the large model responsible for text creation has received professional training and knows better how to write scripts that are "cool" and generate traffic.
Just enter a concept or story idea and click "Creation Type", such as a romantic drama, and the system will automatically generate a script that meets the requirements with a complete structure and rich plot.
Users can also upload ready-made scripts and let AI help polish and optimize them to improve the professionalism and readability of the scripts.
The system will automatically generate a script that meets the requirements based on the creative prompts, and will also summarize the biographies of the characters involved to prepare for subsequent character design.
Now that we have the script, the next step is to "find actors." On SkyReels, this step is called character design.
Usually, we will let LLM write the design text first, and then throw it into the text drawing tools such as Midjourney to generate the character image. To add voiceovers to the characters, we have to continue to call on audio tools such as Elevenlabs.
Now, you only need to enter the page and enter the corresponding requirements (including character dubbing) to "generate" the character with one click, which greatly improves production efficiency.
Just enter the page and enter the corresponding requirements (including character dubbing) to generate the character with one click.
Before starting filming, in addition to finding actors, the director also has to make storyboards.The so-called storyboard is mainly to break down the whole story into a series of continuous pictures, each of which represents a specific scene or action.
Storyboards for Studio Ghibli's Spirited Away.
It is very troublesome for directors without art background to complete the storyboard design with the help of painters. Now, they can let LLM generate the text script of the storyboard, and then use tools such as Midjourney to draw the storyboard.
However, this method also has obvious drawbacks. It is difficult to ensure the consistency of roles and scenes. For example, in the AI full-process short drama "Nuwa" launched on CCTV's AI channel, Nuwa looks different every time she appears, as if there are several Nuwas.
On SkyReels, AI will generate storyboard images and corresponding text scripts based on the script content with one click. You only need to wait 1-2 minutes to view the effect of each shot. If you are not satisfied, you can also adjust the storyboard effect by modifying the text (such as the scene or character action).
More importantly, with the support of self-developed technology, the storyboard images are not only high-definition and rich in details, but the characters and scenes can also maintain consistency and continuity in different storyboards.
AI will generate storyboard images and corresponding text scripts based on the content of the script with just one click.
After the storyboard design is completed and the actors are in place, it is time to enter the "actual shooting" stage.At present, the biggest bottleneck in the development of AI film and television lies here, because there are too few good "cameras" that can be used.
The common practice is to use Pika and Runway to generate dynamic effects, but there are many problems. For example, the image quality is poor, the character's movements are small or even unreasonable (eating noodles), and the scene is often inconsistent. Sometimes the vehicle is moving but the wheels are not turning, and when the water is flowing, the water splashes but the surface is still. The characters' mouth shapes do not match, and their expressions are stiff.
Through the combination of AI 3D engine and large video model, SkyReels can automatically convert storyboards into continuous videos. The generated scenes and characters are more vivid and consistent. It also supports 1080P 60 frames video output, ensuring the drama-watching experience.
In addition, the length of the video that can be generated at a time is up to 180 seconds, which is a significant breakthrough compared to Sora's 60-second video and Keling's 10-second video.
SkyReels can automatically convert storyboards into continuous videos.
Finally, all process results can be integrated with one click to quickly generate the final short play. AI will also generate and recommend appropriate background music and sound effects based on the script subject matter and specific scenes, and users only need to add them with one click.
Background music and sound effects can also be added with one click.
It supports exporting videos with one click and can be published to social media platforms such as Tik Tok with one click.
Supports one-click sharing of character designs.
2. Three layers of technological innovation behind "One-click drama"
Three layers of technological innovation are like three pillars that support SkyReels' "One-click drama":
The self-developed script big model SkyScript, the self-developed storyboard big model StoryboardGen, and the industry's first innovative platform WorldEngine that deeply integrates AI 3D engine and video big model.
The script model SkyScript is responsible for the "soul" of the short play - the script.In fact, not only the script, but also the big text model supports the entire creative process.
Some short drama writers have tried to use ChatGPT to generate scripts, but found that the final results lack emotional tension and plot changes, and are just a pile of flat text. Kunlun Wanwei has built a high-quality short drama structured dataset SkyScript-100M with hundreds of millions of words. This dataset has high-quality annotations for the plot rhythm, excitement, and emotional changes of a large number of wonderful short dramas, and is specially designed for script creation.
SkyScript script large model technical schematic diagram
The storyboard generation principle of the SkyScript script model.
For example, in addition to learning the basic principles and general patterns of creation from massive amounts of data, in order to create a hit, you also need to find out some tried-and-tested "routines". Audiences often have a clear preference for fast-paced, intense conflicts, repeated suspense, and multiple reversals; themes such as counterattacks to change fate, domineering wives, wealthy family feuds, time travel and rebirth, vampires, and werewolves are also never boring.
ReelShort's hit short drama "The Double Life of My Billionaire Husband" is about marriage first and love later. Each episode is about 1 minute and 30 seconds. As of about episode 12, the male and female protagonists quickly warmed up their relationship in the dense plot intersection, and it covers plots such as vicious female supporting roles, contractual marriage, heroes saving beauties, and property disputes. From Guohai Securities' in-depth report on the overseas expansion of Chinese short dramas.
By carefully marking the “highlights” in the story that can arouse the audience’s strong interest, such as the protagonist’s appearance, the composition of the shot, and the emotions expressed throughout the characters, SkyScript learned to pay attention to and generate these details.
In terms of model architecture, SkyScript also uses a multi-agent framework to ensure the professionalism and controllability of generated content. Through the collaboration of agents such as "creative people", "casting directors", "screenwriters", "novel authors", and "directors", it simulates the industrial production process to complete the script creation.
Quality assessment of SkyScript, a large script model.
Short plays, after all, are a presentation of visual language. Therefore, the other two layers of innovation - StoryboardGen and WorldEngine - focus on the "flesh and blood" of short plays, that is, shooting.
Like SkyScript, the self-developed storyboard model StoryboardGen has also been trained with high-quality, professional storyboard examples from the real world. It is designed specifically for storyboards and is different from general image generation models.
Similarly, based on the multi-agent framework, the different elements of the storyboard (scenes, shots, characters, actions, etc.) are decomposed into multiple agents for processing, which greatly enhances the controllability and consistency of the storyboard production process.
Technical schematic diagram of StoryboardGen, a large storyboard model. Similar to film shooting and animation production, StoryboardGen based on a multi-agent framework breaks down the overall process into multiple agents, each of which is responsible for a specific capability, enhancing the controllability and consistency of the storyboard production process.
Suppose there is a script that describes a scene, such as a person walking in a park.
LLM Planner will first break down this script into two parts.
Among them, the global prompt is: "A person is walking in a sunny park";
Local prompt: "This person is a middle-aged male, wearing casual clothes, holding a coffee cup in his hand, and walking at a leisurely pace."
In the generation stage, different agents perform their respective duties. For example, the scene agent generates the background and layout of the park based on the global description; the character agent generates the image and actions of the male character based on the local description.
Finally, the Storyboard agent integrates these generated contents and generates the final storyboard based on all the descriptive information and conditions.
Quality assessment of the storyboard mockup StoryboardGen.
In addition to controllability and consistency, in order to make the storyboards more expressive, StoryboardGen also greatly improves the complexity and detail accuracy of the pictures.
For example, StoryboardGen uses a progressive generation framework based on DiT to create the final image through multiple revisions and improvements. Compared with the traditional one-time generation model, this framework can make full use of the information generated in the intermediate process to generate higher quality and richer visual effects.
The third layer of technological innovation is an innovative platform, World Engine, which is the first in the industry to seamlessly connect 3D generation technology and video generation technology through layer fusion and other methods, which is equivalent to providing creators with a powerful "camera" or even a "studio."
WorldEngine combinesThe precise controllable capabilities of the engine (such as lighting simulation, physics simulation, 3D space, real-time interaction, etc.) and the fantasy generation capabilities of AI video large models, providing a new online hybrid video creation mode, allowing video creation to move from fuzzy generation to more precise and controllable.
Suppose you are making a scene where a Pikachu is having fun under a fountain. You can let Sky3DGen create an accurate fountain scene; at the same time, let the video model generate a realistic Pikachu.
Hybrid Generative Video Example
Video link: https://mp.weixin.qq.com/s/4w5eOquY6p2Z7pXIUuKf9w
We know that large video models such as Sora can easily generate almost realistic effects that are difficult for game engines to match, and they are full of imagination, but they do not understand the physical world and cannot accurately simulate some of the most basic physical interactions, such as breaking glass and eating noodles.
The advantage of game engines lies in their accurate simulation of real physical laws. Through complex mathematical models, they can create a virtual environment that is coherent in time and space and conforms to objective laws. This not only ensures the consistency and predictability of rendering results, but also demonstrates a deep understanding of three-dimensional space.
As one of the largest game development and operation companies in China, it is not surprising that Kunlun Wanwei developed its own Sky3DGen big model and "complemented the advantages" of the video big model to provide creators with a new hybrid creation mode.
On SkyReels, you can create various 3D scenes and shapes, and even character performances.
3D Prop Video Generation Example
Video link: https://mp.weixin.qq.com/s/4w5eOquY6p2Z7pXIUuKf9w
3D scene video generation example
Video link: https://mp.weixin.qq.com/s/4w5eOquY6p2Z7pXIUuKf9w
Character performance is one of the core elements of a short play. Kunlun Wanwei has developed its own ActorShow character performance generation model, which has stronger controllable generation capabilities for lip expressions and body movements.
Video link: https://mp.weixin.qq.com/s/4w5eOquY6p2Z7pXIUuKf9w
Quality assessment of character performance generation models.
During the creative process, users can also freely define a 3D virtual shooting studio.
Want to shoot a story in the desert today? With a few clicks, the whole scene will become a vast desert. Want to shoot in the space station tomorrow? With a few more clicks, the surroundings will become the high-tech interior of the space station.
You can even place and move virtual cameras in the virtual studio you build, try various shooting angles, adjust the light, add special effects, and get very professional shooting results.
Due to the use of the engine, WorldEngine has achieved a revolutionary reduction in cost compared to traditional video generation. At the same time, the generation speed and controllability have been improved by several orders of magnitude.
3. Bet on AI UGC and then take to the table
SkyReels, an AI short drama platform, is the latest member of Kunlun Wanwei's AI application layer product matrix.
Prior to this, they have successfully built a diverse product array including AI search, AI music, AI video, AI social, AI games, etc., and some businesses have been commercialized.
As one of the earliest Chinese companies to explore the global market, with more than ten years of experience in the content and entertainment sectors, Kunlun Wanwei has realized that UGC (user-generated content) platforms have maintained a long-lasting trend in the content and gaming fields. It also predicted that the intervention of AIGC will not only make the IP creation methods of online articles, short dramas, animations, and games more diversified, but more importantly, greatly lower the threshold for content creation.
As the industry says, "Every time the threshold for producing content is halved, the number of people creating content will increase tenfold," which heralds huge market opportunities.
Therefore, Kunlun Wanwei is committed to building a comprehensive UGC platform with IP as the core, so that all users who use AI for creation can complete the full closed loop of IP. They know that a tool that can hide all technical details and achieve end-to-end content generation is truly commercially valuable, which is also the deep logic of "one-click generation" products such as SkyReels.
In addition to building an AI UGC platform at the upper level, Kunlun Wanwei is also committed to developing the foundation of a universal large model at the bottom level. This stems from a simple yet profound insight: from a technical perspective, human wisdom is precipitated in the form of text, and all the exclusive models for social networking, games, music, and videos cannot be separated from the support of the text large model.
Kunlun Wanwei's self-developed Tiangong model has been upgraded to version 3.0. Tiangong 3.0 uses a 400 billion parameter MoE architecture and is currently one of the world's largest open source MoE models with the strongest model parameters. In the results of multiple authoritative multimodal evaluations such as MMBench, Tiangong 3.0 surpasses GPT-4V, and multiple evaluation indicators have reached the world's leading level.
With a solid general big model foundation, Kunlun Wanwei has gradually extended its model capabilities horizontally towards the content and entertainment fields - from music, literary images, video generation to short drama generation, and has successively launched SkyMusic AI music big model, Skywork-MM multimodal big model, SkyScript script big model, StoryboardGen storyboard big model, Sky3DGen 3D big model, etc.
Fang Han, Chairman and CEO of Kunlun Wanwei, once predicted that just as cameras revolutionized the way of filming and gave birth to a large number of short video platforms such as Douyin and Kuaishou, AI will also give birth to a large number of new AI UGC platforms. He firmly believes that only the "free + to C" model can breed real giant companies in the AI era.
For Kunlun Wanwei, which has a strong 2C gene, the rise of AIGC is undoubtedly a rare opportunity. This company, which has always dreamed of growing into a leading artificial intelligence technology company, originally thought that it had missed the opportunity to get on the table, but unexpectedly AIGC opened a new door for them. They are preparing for battle and going all out.
Beta application address: https://skyreels.ai/beta