news

The latest open source project of Kuaishou Keling team has become popular: uncle turned into a girl, GitHub has 7.5K stars

2024-07-23

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

It's outrageous! ! If you don't watch the full video, who would have known that the beautiful girl in it is actually an uncle.

[Unable to insert the video here, unfortunately... You can check it on the QuantumBit official account~]

Well, it turns out that this was done by the Kuaishou Keling team.Controllable portrait video generation framework——LivePortrait。

LivePortrait became popular as soon as it was open sourced, and it has been widely used on GitHub in a short period of time7.5KStar.

It also attracted Thomas Wolf, chief strategy officer of HuggingFace, to experience it personally:



Even now it is still ranked first among all HuggingFace applicationsTrend First



So, why is LivePortrait so popular?

It has to start with its eye-catching performance...

Let the expression "transfer flowers and trees"

LivePortrait is open sourced by the Kuaishou Keling Big Model Team.1 original imageYou can generate dynamic video.



Let’s first look at a set of official outputs.

fromsimplestTo get started, throw in a still image and LivePortrait can make the portrait blink, smile, or turn its head.

You can also perform"graft", that is, copying expressions, movements, etc. onto other people, regardless of style (realism, oil painting, sculpture, 3D rendering) and size~



Of course, this "magic"Not limited to one person, it’s not impossible to have a family photo. [doge]



In addition to changing from static images to videos, we can alsoOne or more videosAchieve the "smile increase technique".

For example, if we provide a video of a baby with no expression (far right), we can ask the baby to wink or smile according to the reference video.



By the way, it’s not limited to portraits of people, kittens and puppies can also start acting cute and coquettish.



In short, LivePortrait can achievePrecise control of character expressionsFor example, you can choose the curvature of the corners of your mouth and the degree of enlargement of your eyes.

For examplechestnutson, the following two are the changes in the size of the characters' eyes under different parameter settings:





It seems that the "three parts indifference, three parts ridicule, and four parts indifference" in the novel is not impossible to achieve. [doge]

I don’t know if you are excited after reading this, but netizens can’t stop themselves from having fun.

For example, if you add some lights to make faces, it will look like a horror movie:



Another example is the real-time transformation of the two-thorn ape:



After looking at these examples, let’s dig into the technical principles behind them.

Hot open source framework

Different from the current mainstream diffusion model-based methods, LivePortrait explores and expandsImplicit keypoint based frameworkpotential.

Specifically, LivePortrait does not rely on explicitly visible landmarks or feature points in the image, but instead implicitly infers the locations of keypoints by learning patterns in the dataset.

On this basis, LivePortraitTwo stagesTrain the model from scratch.

Let’s talk about the first stage first. LivePortrait made a series of improvements to implicit point-based frameworks (such as Face Vid2vid).



These improvements includeHigh-quality data organization, hybrid image and video training, upgraded network architecture, scalable motion transformation, landmark-guided implicit keypoint optimization, and application of cascaded loss terms, etc.

With these, the generalization ability, expressiveness and texture quality of the model can be further improved.

In the second stage, through the training of the fitting module and the redirection module, the model can process the details of facial expressions more accurately.



Fitting moduleEnhance generalization by cross-identity action training, estimate expression variations, and optimize key points.

Eye and mouth redirection moduleThe deformation changes of the eyes and mouth are processed separately, and the pixel consistency and regularization loss are calculated through independent objective functions to improve the flexibility and accuracy of the model in processing complex expressions.

So, how does LivePortrait perform specifically?

Research shows that inComparison results of the same identity driver,Compared with existing methods, LivePortrait has better generation quality and ,driving accuracy, and can capture the subtle expressions of the eyes and ,mouth of the driving frame while preserving the texture and identity of the ,reference image.





And inCross-identity driver comparison resultsIt also performs well, although it is slightly weaker than the diffusion model-based method AniPortrait in terms of generation quality. However, compared with the latter, LivePortrait has extremely fast inference efficiency and requires fewer FLOPs.





In summary, on the RTX 4090 GPU, LivePortrait is generated at a speed of12.8 milliseconds per frame, which is significantly higher than the existing diffusion model methods.

One More Thing

Add an official latest notice: Keling AI will soon be ingloballyLaunch its services.

Sora hasn't come yet, but Ling went out first this time~