sora was surpassed again! meta ai video model explodes, making video editing easier than p-pictures

2024-10-05

zuckerberg has been busy "stealing the limelight" around the world recently.

not long ago, he started his "second entrepreneurship" and just showed us the most powerful ar glasses meta orion, which he has been honing for ten years. although this is just a prototype machine betting on the future, it has stolen the limelight of apple's vision pro. .

last night, meta once again stole the show in the video generation model track.

meta said that the newly released meta movie gen isthe most advanced "media foundation models" to date.

however, let’s take a precaution first. meta officials have not yet given a clear opening timetable.

officials claim to be actively communicating and cooperating with professionals and creators in the entertainment industry, and are expected to integrate it into meta's own products and services sometime next year.

briefly summarize the features of meta movie gen:

it has functions such as personalized video generation, precise video editing and audio generation.

supports generating 1080p, 16 seconds, and 16 frames per second high-definition long videos

capable of generating up to 45 seconds of high-quality and high-fidelity audio

enter simple text to achieve sophisticated and precise video editing capabilities

the demo was excellent, but the product isn’t expected to be officially available to the public until next year

say goodbye to "mime" and focus on large and comprehensive functions

broken down, movie gen has four major functions: video generation, personalized video generation, precise video editing and audio generation.

the vincent video function has long been a standard feature of video generation models. however, meta movie gen can generate high-definition videos with different aspect ratios according to user needs, which is the first of its kind in the industry.

Text input summary: A sloth with pink sunglasses lays on a donut float in a pool. The sloth is holding a tropical drink. The world is tropical. The sunlight casts a shadow.

Text input summary: The camera is behind a man. The man is shirtless, wearing a green cloth around his waist. He is barefoot. With a fiery object in each hand, he creates wide circular motions. A calm sea is in the background. The atmosphere is mesmerizing, with the fire dance.

in addition, meta movie gen provides advanced video editing functions, allowing users to achieve complex video editing tasks through simple text input.

from the visual style of the video, to the transition effects between video clips, to more detailed editing operations, this model also gives you enough freedom.

in terms of personalized video generation,meta movie gen also takes a big step forward.

users can upload their own images and use meta movie gen to generate videos that are personalized while maintaining character and movement.

Text input summary: A cowgirl wearing denim pants is on a white horse in an old western town. A leather belt cinches at her waist. The horse is majestic, with its coat gleaming in the sunlight. The Rocky Mountains are in the background.

from kongming lanterns to transparent colored bubbles, you can easily replace the same object in a video with just one sentence.

Text input: Transform the lantern into a bubble that soars into the air.

although many video models have been unveiled this year, most of them can only generate "mime". it is a pity to abandon them if they are tasteless. meta movie gen has not "repeated the same mistakes".

Text input: A beautiful orchestral piece that evokes a sense of wonder.

users can provide video files or text content and let meta movie gen generate corresponding audio based on these inputs. (ps: pay attention to the dubbing of the skateboard landing)

and, it can not only create a single sound effect, but also create background music or even a complete soundtrack for the entire video, thus greatly improving the overall quality of the video and the audience's viewing experience.

after watching the demo, lex fridman expressed his admiration succinctly.

many netizens once again "pushed" openai's futures sora, but morenetizens who have been waiting eagerly have begun to look forward to the opening of test experience qualifications.

meta ai chief scientist yann lecun also promoted the meta movie gen platform online.

the pie painted by meta is worth looking forward to

when meta movie gen was launched, the meta ai research team also published a 92-page technical paper at the same time.

according to reports, meta’s ai research team mainly uses two basic models to achieve these extensive functions-movie gen video and movie gen audio models.

among them, movie gen video is a basic model with 30b parameters, which is used for text-to-video generation and can generate high-quality hd videos up to 16 seconds long.

the model pre-training phase uses a large amount of image and video data to understand various concepts of the visual world, including object motion, interaction, geometry, camera motion and physical laws.

to improve the quality of video generation, the model is also supervised fine-tuned (sft) using a small set of carefully selected high-quality videos and text captions.

the report shows that the post-training process is an important stage in movie gen video model training, which can further improve the quality of video generation, especially the personalization and editing functions of images and videos.

it is worth mentioning that the research team also compared the movie gen video model with mainstream video generation models.

because sora is not currently open, researchers can only use its publicly released videos and tips for comparison. for other models, such as runway gen3, lumalabs, and keling 1.5, researchers choose to generate videos themselves through api interfaces.

and because the videos posted by sora have different resolutions and durations, the researchers cropped the videos from movie gen video to ensure that the videos had the same resolution and duration when compared.

the results show,the overall evaluation effect of movie gen video is significantly better than runway gen3 and lumalabs, has a slight advantage over openai sora, and is equivalent to keling 1.5.

in the future, meta also plans to publicly release multiple benchmarks, including movie gen video bench, movie gen edit bench, and movie gen audio bench, to accelerate research on video generation models.

the movie gen audio model is a 13b parameter model for video and text-to-audio generation, capable of generating up to 45 seconds of high-quality and high-fidelity audio, including sound effects and music, and synchronized with the video.

the model adopts a generative model based on flow matching and a diffusion transformer (dit) model architecture, and adds additional conditional modules to provide control.

even meta's research team introduced an audio expansion technology that allows the model to generate coherent audio beyond the initial 45-second limit. that is to say, the model can generate matching audio no matter how long the video is.

yesterday, tim brooks, the head of openai sora, officially announced his resignation and joined google deepmind, which once again cast a haze on the uncertain future of the sora project.

according to bloomberg, meta vice president connor hayes said that meta movie gen currently has no specific product plans. hayes revealed a significant reason for the delayed rollout.

meta movie gen currently uses text prompt words to generate a video that often requires dozens of minutes of waiting, which greatly affects the user experience.

meta hopes to further improve the efficiency of video generation and launch the video service on the mobile terminal as soon as possible to better meet the needs of consumers.

in fact, if we look at the product form,the functional design of meta movie gen focuses on being large and comprehensive, andthere is no "lame leg" like other video models.

the most prominent shortcoming is that it has the same "futures" flavor as sora.

the ideal is very full, the reality is very skinny.

you may say that just as sora is being overtaken by large domestic models, when meta movie gen is launched, the competitive landscape in the field of video generation may change again.

but at least for now, the pie painted by meta is enough for people to swallow.

news

sora was surpassed again! meta ai video model explodes, making video editing easier than p-pictures

introduction

my contact information