news

Zhipu AI officially open-sourced their Sora "Qingying"

2024-08-06

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

It was the middle of the night, and the companies on the other side of the ocean were not involved, but the domestic companies were.

I really want to sleep, really.

It all started when I was browsing Github before going to bed and accidentally saw the account THUKEG updating a project.

CogVideoX

THUKEG is the official account of Zhipu.CogVideoX is the base model of Zhipu’s second-generation AI video clearing, which was very popular in the past two weeks.

In the most popular terms, CogVideoX is equal to GPT4o, and Qingying is equal toChatGPTYou can simply understand it as, one is a model, and the other is a product based on the model, so you can actually draw an equal sign.

Two weeks ago, in the second-generation AI video war, based on the existing three magical gods Runway, Keling, and Luma, Pixverse launched the V2 version, and Vidu's model was finally released.

As the most star AI company in the field of large models, Zhipu has also joined the AI ​​video melee and released their DiT video product, Qingying.

This product can be used on their AI assistant Zhipu Qingyan.

But to be frank, I didn't write it because I think there is still a certain gap in the generation effect compared with KeLing and Runway.

Two weeks after the release of the film, they decided toCogVideoX,Now open source.

Then it's worth bragging about.

CogVideoX model download address:

Currently, all mainstream AI videos are closed source. There is an open source one called Open-Sora, but to be honest, the results are far from satisfactory.

As for Qingying, although its performance is still not as good as that of mainstream closed-source models, it is at least usable when running some content.

This time, I roughly looked through the open source version.A small model of the CogVideoX-2B.

Reasoning requires 18G video memoryThat is to say, when you have a single graphics card 3090 or 4090, you can run videos locally directly without spending money. However, the peak value will reach 36G, and the video memory will most likely explode.

But they themselves said that they will optimize it soon.

But I only have a rubbish 4060 with 8G video memory, and I can't run it even after you optimize it. To be honest, I really don't have the money to buy a 4090 = =

It would be great if the AI ​​video model could be as popular as SD1.5 so that everyone can run it.

This 2B model,The video length is 6 seconds, the frame rate is 8 frames per second, and the video resolution is 720*480

This parameter gives me the feeling of the first generation of Dream.

I will post a few of their official cases (in fact, you can run a few on Qingying and they are similar)

An exquisite wooden toy boat, with intricate patterns carved into its mast and sails, glides smoothly across a plush blue rug that simulates ocean waves. The boat is painted a rich brown and features small windows. The rug is soft and textured, providing a perfect backdrop, like a vast ocean. The boat is surrounded by various toys and children's items, suggesting a childlike environment. The scene captures the innocence and imagination of childhood, with the toy boat's journey symbolizing endless adventures in a whimsical interior setting.

The camera follows behind a white vintage SUV with a black luggage rack on the roof as it drives quickly down a steep dirt road surrounded by pine trees on a steep hillside. Dust is flying from the tires and the sun shines on the SUV as it drives quickly on the dirt road, casting a warm glow on the entire scene. The dirt road curves gently into the distance, with no other vehicles in sight. The trees on both sides of the road are redwoods, with a few green plants scattered around. From the rear, the car moves along the curve with ease, as if driving over rough terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky and misty white clouds overhead.

In a war-torn city, ruins and broken buildings speak of devastation, and against this heartbreaking backdrop, a poignant close-up captures a young girl. Her face, smeared with ash, bears silent witness to the chaos around her. Her eyes gleam with sadness and resilience, capturing the raw emotion of a world stripped of its innocence by conflict.

This is roughly the reasoning, but after it is open sourced, what I am more looking forward to is actually the fine-tuning and plug-in ecosystem.

For example, the 1.5 model of AI drawing SD that everyone is using now, the base model is actually as bad as a piece of shit, but it is open source after all. A bunch of great gods have made very awesome models based on SD1.5, such as Majic, DreamShaper, Anything and so on.

andCogVideoX can also be fine-tuned.

I remembered the AI ​​video model made by Jieyuexingchen and Shanghai Film Group at WAIC. They used 200 minutes of Calabash Brothers material and made a large model of Calabash Brothers.

Everything you produce is in the style of Calabash Brothers, and you don't have to work hard to make the characters consistent. If I input Big Wa, then Big Wa will come out. If I write about a grandfather and a snake spirit drinking beer together, then that's the two of them.

And now,CogVideoX is open source, and if we can fine-tune it, some people who use AI to make short dramas and long series can try to fine-tune their own large video model to personalize the style and characters.

Because I have always felt that the upper limit and performance dynamics of literary videos are much higher than those of graphic videos, but the two biggest hurdles are style consistency and character consistency. If they can be fine-tuned, there are many ways to solve them.

Fine-tuning of CogVideoX-2B requires 40G of video memory. Ordinary graphics cards are not enough, and a rendering card like A6000 is required.

However, it is a video model after all. It cannot be applied to the general public, but for some start-ups and small businesses, the threshold is almost zero.

Because it is open source, they no longer need to spend countless amounts of money to build a large model from scratch and take this risk. They only need to buy some local cards, which only cost tens of thousands or hundreds of thousands of yuan in total, and then they can make fine adjustments locally.

I have always believed that the future of open source will be better than that of closed source.

On the night that Zuckerberg released LLaMa3.1 405B some time ago, he posted a 10,000-word open letter on Facebook.

There was a passage in it that left a deep impression on me.

Translated:

I firmly believe that open source is essential to achieving a positive AI future. AI has the potential to increase human productivity, creativity, and quality of life more than any other modern technology, and accelerate economic growth while promoting advances in medical and scientific research. Open source will ensure that more people in the world can enjoy the benefits and opportunities brought by AI, prevent power from being concentrated in the hands of a few companies, and enable this technology to be more evenly and safely promoted throughout society.

Preventing power from being concentrated in the hands of a few companies makes this technology morePromote balanced and safe development in the whole society

Open source is the best means. Closed source will not bring about technological equality, but open source will, because AI is not an entertainment tool, it is a productivity tool, and its promotion mainly comes from companies, research institutions, etc.

Every company has three major pain points when using AI:

1. They need to train, fine-tune, and refine their own models.

2. They need to protect their private data.

3. They hope to turn their AI into a long-term standard ecosystem.
All of this can be summed up in one sentence:

We need to be able to control our own destiny rather than leave it to others.

In China, Zhipu is a company that I think is very special.OpenAI, and has a Meta temperament.

You have to know that Meta's business model is completely different from that of some large model companies such as OpenAI. They do not make money by selling the right to use large models, so open source does not actually have much impact on Meta.

But Zhipu is different, Zhipu is a large model company.

But with such considerations, they still resolutely opened it to the public.

Maybe they are just like Meta, working for a lofty belief: "to enable this technology to be promoted more evenly and safely throughout society."

Apart fromIn addition to CogVideoX, they have also open-sourced many other things.

Go to their Github and you will find a lot of surprises:

I love every company that is willing to open source.

I look forward to the day in the future when countless developers develop various plug-ins and fine-tuning models based on CogVideoX, and every company in various video-related industries such as film, television, short dramas, advertising, etc. will also have its own numerous models and various video generation workflows.

Just like SD is thriving in every enterprise.

I admire Zhipu.

This is not only a technical decision, but also a transmission of belief.

The lights on the other side of the ocean gradually went out.

And here is our dawn.

Is rising.

The above is all. Now that you have read this far, if you think it is good, please click like, reading, and forward. If you want to receive push notifications as soon as possible, you can also give me a star ⭐~ Thank you for reading my article, see you next time.
>/ Author: Kha'Zix