news

PixVerse V2 is here! Create 5 Soras in one go, and the video-generated track is flying

2024-07-24

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


Author: Yoky

Email: [email protected]

“Too curly!”

Since June, video generation products have experienced a huge explosion. From KeLing to Luma and Runway Gen3, model capabilities and product effects have become increasingly competitive.

Just now, PixVerse launched version V2. In addition to updating the DiT architecture, it can actually generate 5 "Sora" effect video contents in a row!

In creative tests such as cats eating noodles and dogs riding motorcycles, PixVerse V2 performed excellently in video clarity, dynamic quality and aesthetics.

According to Silicon Star,Based on the DiT (Diffusion + Transformer) architecture combined with the spatiotemporal attention mechanism, PixVerse V2 has achieved a phased upgrade in model capabilities.It supports the generation of 8 seconds of video at a time, while significantly improving the dynamic range, detail expression and authenticity of the video; another major update is that PixVerse V2 supports the generation of 1-5 continuous video content with one click, and the consistency of the main image, picture style and scene elements will be automatically maintained between the clips, that is, everyone can easily generate 40 seconds of video content!

From the generation effect point of view, on the one hand, the video generated content of PixVerse V2 version has higher information density and can convey more information within a few seconds. At the same time, the consistency upgrade also makes the generated content more usable. On the other hand, the product design of V2 simplifies the complex functions as much as possible, so that novices can also create.

We have found that with the continuous iteration of video generation model technology and products, ordinary users have shown great demand both at home and abroad. The creation of AI video content is not limited to professional groups. Ordinary users are also eager to express their creativity and ideas through simple and intuitive tools.

Looking at the product iteration of PixVerse V2 from this perspective, you will find that every function is trying to get closer to users.

1

1. Each iteration brings us one step closer to users

Since its launch, PixVerse has become one of the most popular video generation products with its technological innovation and deep insight into user needs.

In the latest V2 version, one of the breakthrough features is the ability to generate multiple videos at once while maintaining the consistency and coherence of the elements between the videos. This feature is of great significance for the creation of long-form video content, as it allows creators to generate a series of interrelated video clips around a theme or storyline.


In the evaluation, we found that PixVerse V2 performed well in handling complex scenes and long video sequences. The same character image can freely shuttle between different scene settings. In addition, coherence is not limited to visual consistency, but also includes smooth transitions in action and plot, which is especially important for narrative videos.

Another highlight is its enhanced usability. Unlike those "creative toys" on the market that can only generate short clips and require constant "drawing cards" and secondary editing, PixVerse V2 can not only generate high-quality video clips, but also extend the creativity of short clips, thereby directly outputting complete and usable creative content.

This feature means that PixVerse V2 is no longer limited to generating short shots for secondary editing, but can directly output complete videos that can be used on multiple platforms and in multiple scenes.

In actual evaluation, this feature of PixVerse V2 significantly improves the efficiency and convenience of video creation. Users no longer need to spend a lot of time on video editing and synthesis, and can focus more on creativity and content itself. Whether it is a short video for sharing on social media or a story video that requires continuity, PixVerse V2 can provide a one-stop solution. The enhanced usability further broadens the scope of application of AI video generation technology, allowing both ordinary users and professional creators to benefit from it.


We can see the technical innovation and multiple polishing of PixVerse V2 in terms of user experience. According to Silicon Star, AiShi introduced an innovative spatiotemporal attention mechanism in the underlying model, which enabled PixVerse V2 to achieve a breakthrough in Diffusion spatiotemporal modeling and significantly improved its ability to handle complex scenes. At the same time, its powerful text understanding ability enables the model to more accurately match text prompts with video content, achieving deep multi-modal fusion.

In addition, PixVerse V2 has also been optimized in terms of computing efficiency. By improving the traditional flow model and weighted processing of the loss function, the model can converge faster, thereby improving the speed and accuracy of video generation. The introduction of the 3D VAE model and the application of the spatiotemporal attention mechanism further improve the quality of video compression and reconstruction, ensuring efficient transmission and storage of video content.

Looking back at several major milestones of PixVerse since its release, we will find that behind this is not only technical strength, but also its keen grasp of market and user needs.

In May, PixVerse launched a revolutionary motion brush feature. This feature allows users to control the movement of specific areas in a video by simply drawing a trajectory, greatly improving the flexibility and intuitiveness of video creation. Specific scenarios include but are not limited to animation production, advertising creativity, social media content generation, etc. User feedback is generally positive, and it is believed that this feature greatly simplifies the video editing process and makes creation more free and intuitive.

In terms of product function settings, when users are basically satisfied with the generated results but want to adjust the details, PixVerse supports secondary editing and free transformation functions, allowing users to flexibly adjust the video display effect according to the needs of different platforms and scenarios. In addition, PixVerse can also choose different styles and aspect ratios, providing users with a higher degree of freedom in video creation.


From motion brushes to character consistency features, to the coherent video generation of the V2 version, each update is one step closer to users. This user-centric innovation concept makes PixVerse not only a technically realized product, but also a partner for users to realize their creativity.

1

2. Depth determines innovation

When we evaluate whether a video generation product is a toy or a productivity tool, information density is an important indicator of content quality.

If a ball moves randomly against a blank background, it can move for an infinite amount of time but provide very little information.

In PixVerse V2, Aishi Technology tried to increase the information density of video generation through technical means, freeing users from the tedious editing of video materials and directly entering the creative video content creation. The ultimate pursuit of consistency, ensuring that the main body of multiple clips remains unchanged, and other functions are all for the purpose of going directly from video materials to publishable video content.

The product manager of PixVerse said: The team always adheres to the product idea of ​​"walking with users". In the early stages of product development, the team will conduct in-depth pre-research, including communication with industry practitioners, observation of actual users, and collection of community feedback. This all-round user research method enables Aishi to capture subtle but critical user needs. Even small functional points proposed by users will be taken seriously and tested.

This user-driven innovation concept makes PixVerse's functions closer to users' actual needs. The product manager of Aishi shared the birth process of Magic Brush, which is a typical example of Aishi's product creation.

Earlier this year, Runway launched its first brush, Motion Brush, which allows users to adjust the motion trajectory of the subject by selecting different brushes and adjusting the control button below. Seeing this feature and through market research, user interviews and community feedback, we found that users do have a high demand for more flexible video editing tools, but the interaction method of Motion Brush is not flexible enough, and the debugging is not precise and controllable enough.


After discovering this need, PixVerse's product team focused on discussing: What kind of interaction method can allow users to use the brush function more intuitively and simply?

After preliminary user research, the product team found that, firstly, users need to debug the motion trajectories of multiple subjects, but Motion Brush does not have the function of partition selection; secondly, after selecting a subject, the subject's motion trajectory is changeable, and the up, down, left, and right control buttons cannot simulate the real motion effect. Therefore, in the creation of Magic Brush, the product team chose the method of smearing, intelligent selection of partitions, and an interactive method that allows users to freely draw motion trajectories in 360 degrees.


However, this interactive method, while convenient for users, poses a greater technical challenge. Based on the DiT architecture, Aishi's technical team developed a core algorithm that supports the Magic Brush function, which analyzes the user's brush input and converts it into motion effects in the video.

Magic Brush went online in just one month, from discovering user needs to product establishment and technical solutions. This is inseparable from the "short, flat and fast" company operation model of Aishi.

The marketing department can quickly collect user feedback and convey it to the product and technical teams in a timely manner. This rapid information flow and decision-making process enables Aishi to quickly consider the feasibility of demand, conduct A/B testing, and make decisions quickly. This agility is a unique advantage of startups and is also the key to Aishi's ability to quickly iterate products.

Compared with large technology companies, startups have certain advantages in response speed and flexibility. This agility is not only reflected in product development and market strategy, but also has a profound impact on corporate culture and organizational structure. Due to their small size, startups are more flexible in resource allocation. They can quickly transfer resources from one task to another, or from one project to another, thereby ensuring maximum resource utilization.

At the same time, it pays more attention to user feedback and puts user needs at the center of product development. This user-driven product development method enables startups to launch products that meet user needs more quickly. Many startups adopt agile development methods, which emphasize rapid iteration and continuous improvement. By regularly releasing new features and fixing bugs, products can be brought to market faster and optimized based on user feedback.

The agility and deep innovation embodied by Aishi are the unique advantages of startups in the new era of big models.

1

3. Good technology and good products are more important

Today, we find that there is still a long way to go before technology can be applied to users, and products are the most important connectors. Technology is the driving force for innovation, but its value can only be truly reflected when technology is connected to users through products.

During the development of PixVerse, the AiShi Technology team has carefully polished every detail. In version V2, in order to enhance the usability of the video, PixVerse supports secondary editing of the generated results. Through intelligent content recognition and automatic association functions, users can flexibly replace and adjust the video subject, action, environment and camera movement, further enriching the possibilities of creation.


The head of iPoetry's products also said: "Technology's strengths and differences are certainly important, but productization and the user barriers and technical feedback formed by connecting products to more and more users are more critical."

At the same time, in the early stages of technology development, products are also the starting point for technology implementation. Advanced AI technology is transformed into actual product functions that users can perceive and operate. This transformation from technology to products not only accelerates the application of technology, but also provides users with unprecedented convenience and creativity.

Especially when technology has not yet reached the target level, the two-way promotion between technology and products becomes more realistic.

For example, regarding the Magic Brush mentioned above, one creator gave feedback: "At this stage, when the base model capability is not enough to allow multiple subjects to move according to the physical world, customizing the motion brush can increase the creative space. Some characters' blinks, expressions, and complex relative movements can all be accomplished with the motion brush."

At present, although text-generated videos are conceptually attractive, they face limitations in generating content in practice. Due to the limited information density of text itself, it is often difficult to convey all the details of complex scenes and dynamic changes when directly converting text descriptions into video content. Therefore, image-generated videos have emerged as a phased solution.

Compared with text-generated videos, image-generated videos can provide higher information density because they are generated based on visual information and can more accurately capture and reproduce the complexity of the scene. When the base model capability has not yet solved the difference in information density between text and video, the introduction of image-generated videos is not only a reflection of technological progress, but also an innovation in product design ideas.

In the early stage of technology, Aishi's strategy was to make technological breakthroughs while paying more attention to product implementation. It formed user barriers in addition to technical barriers, and product barriers established through understanding and insight into users, thus pushing the boundaries of technology and the limits of products.

Only a video generation product that can be truly used can eventually be retained. It must not only meet the current needs of users, but also look to the future and choose a sustainable technology development path.

Whether it is the early "deadlock" on consistency technology, the iteration of Magic Brush, or the brand-new upgrade of PixVerse V2, Aishi's product strategy, on the one hand, is based on the present and solves practical problems; on the other hand, it looks to the future, chooses the path of long-term development, and is more committed to building a sustainable future.

Through continuous technological research and development, in-depth user insights, and careful polishing of product details, in the video generation industry, regardless of the size of the company, continuous technological breakthroughs are needed to create truly valuable products that can inspire creativity.

Just as the slogan of PixVerse V2 says: Unleashing Creative Potential for Everyone, this is not only an opportunity for PixVerse, but also an opportunity for all creators in the era of big models.