2024-08-13
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Machine Heart Report
Editor: Zenan, Jiaqi
To make a video with Clapper, you just need to be the director.
With the release of Sora, the video field seems to have entered the era of generative AI. However, to this day, we still haven't used OpenAI's official video generation tool, and people who can't wait have already started looking for other methods.
In recent weeks, an open source video editing tool Clapper has attracted attention.
Unlike many video generators offered by tech companies, Clapper is an open-source AI story visualization tool that was launched as a prototype a year ago. It is not designed to replace traditional video editors or modern AI editors that use 3D scenes as input.
The idea of Clapper is to bring together various generative AI technologies to enable anyone to create videos using AI through an interactive, iterative and intuitive process. No external tools, filmmaking or AI engineering skills are required. In Clapper, you do not need to directly edit video and audio file sequences, but instead iterate your story based on AI Agents by adjusting high-level, abstract concepts such as characters, locations, weather, time periods, styles, etc.
Julian Bilcke, the author of Clapper, is an AI front-end engineer at HuggingFace. He said that in order to continue working in this direction, he is also developing a director mode: the goal is to allow users to play videos in full screen, sit comfortably in a director's chair (or sofa), shout commands to the Agent, and let the AI make movies.
In recent days, Julian Bilcke has launched new features such as using large models to convert arbitrary text into timelines. Clapper has also gained popularity, with more than 1,100 stars on GitHub.
How to use
Since it is an open source tool, our main focus is of course on whether it is easy to use.
Do you remember the experience of AI expert Karpathy creating AI short videos? It took this top expert an hour to turn the first three sentences of "Pride and Prejudice" into an animated version. Although there are only three sentences and three scenes, the workflow is much more complicated than three sentences. He first used Claude to generate a series of image prompts based on the original text, then input these prompts into the text-based graph model to generate the corresponding images, and then gave them to the video model to make animations. The dubbing task was assigned to Elevenlabs, and finally all the clips were integrated in Veed Studio.
So, after Karpathy finished his work, he tweeted, saying: "Entrepreneurs, the opportunity has come! The market is in urgent need of an AI tool that can integrate and simplify these processes."
Clapper is a one-stop platform that integrates all these functions.
Usually, if you want to make a short video, you need to go through the following steps. First, you need a story and a script, then draw a storyboard based on the script, then shoot or find materials based on the storyboard, put them together in the editing software, add animation effects and special effects, and then selectively add voice-over, background music or sound effects. Therefore, the division of labor in the film and television production industry, such as editing, directing, camera, editing, post-production, and dubbing, came into being.
In Clapper, video production follows another logic. Unlike editing software such as Premier and Jianying, each track of Clapper does not correspond to video or picture materials, but corresponds to a specific type of work.
Clipping track
Clapper's Track
When it comes to using AI to make videos, we are the AI client. Clapper is like a crew composed of the strongest AI in the industry. Clapper has built-in a series of "top-tier" models such as GPT-4o and Claude 3.5 (Sonnet). It is like the executive director of the second party, responsible for connecting your needs to the corresponding "AI director".
As can be seen from the above picture, the first track represents the storyboard, which communicates with the large model built into Clapper. It will call the text image model through the API and let the AI storyboard teacher generate the corresponding pictures as the basis of the video screen.
The above text graph model can be accessed through Clapper
Taking the sample clip provided by Clapper as an example, the following tracks correspond to the scene, narration, camera perspective, background music and sound effects. You can ask ElevenLabs or Fal.ai to generate some wind sounds of ruins or explosion sounds of gunfights for this story of the Western wasteland world.
Clapper has another feature that may be a big step towards the dream of "being able to make movies just by talking". We can directly import the script into Clapper and carefully create a character for your protagonist in the "Story" column.
Taking The Wizard of Oz as an example, we can not only add more personalized character descriptions for the characters, but also upload pictures to set the visual image for the heroine Dorothy. This means that we can invite any actor in the world to play this role, even if you want to see 18-year-old Leo play Dorothy, you can do it. Clapper's functions are so detailed that you can adjust the age and timbre of the characters, the furnishings of each scene, the furniture in Dorothy's room, and what the house in the Emerald City, the destination of their adventure, looks like. You can adjust them as you like in Clapper.
Of course, you can also use AI to draw some atmosphere pictures first, which may further stimulate your inspiration and creativity.
However, although Clapper's functions have fully taken into account the needs of making videos, its effect is somewhat unsatisfactory. Not only are the movements of the characters in the picture a bit "ghostly" and inconsistent with the laws of physical motion, but the overall effect of the video is more like a moving PPT, lacking transitions and continuity between shots, and the soundtrack is full of AI flavor, sounding without melody and some noise.
It may take a long time for generative AI to change the video production process, but the emergence of Clapper may provide a new implementation idea for large companies that are still expanding AI functions to traditional video editing software.
References:
https://news.ycombinator.com/item?id=41221399
https://x.com/aigclink/status/1818111874531205216