news

The fine-tuning of Flux swept the entire Internet, and a foreign guy formed a team of Marvel heroes!

2024-08-19

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


New Intelligence Report

Editor: Editorial Department

【New Wisdom Introduction】The king of AI raw images that has swept the open source world is born! Half a month after its release, Flux has become the favorite to replace Midjourney. Developers from all walks of life have begun to fine-tune LoRA with their own photos, and one person can handle multiple styles.

Since Midjourney, we have never seen people so crazy about an AI image processing application.

The emergence of Flux means that AI image generation has entered a new stage.

Musk himself said that he could no longer tell the truth from the false.

First, a realistic photo of a TED speaker swept the entire Internet. Then, Grok 2, which integrates the Flux model, broke through the guardrail restrictions and was played crazy by netizens.

Recently, Flux developers have also started to fine-tune their LoRA models.

The co-founder of HuggingFace exclaimed that Flux has completely swept the open source AI community. He has never seen a model with so many derivative models/online platforms/demos occupying the hot list.



The developer who fine-tuned the system said, "Flux+LoRA will disrupt the generative AI market. You can be anywhere, wearing anything, and wearing whatever clothes you like, and generate a different version of yourself."


For example, transform yourself into Superman.


Take the retractable light and shadow sword, transform into a Jedi Knight, and may the force be with you.


Not only that, ice sculptures, photos of people holding switch game consoles, elf ears, fashion shows and other images are all just a matter of talking.






Swipe left and right to view

Fine-tuning your own LoRA has now become a new toy for many developers.

Now, the entire network is flooded with Flux+LoRA.

One person can form the "Avengers"

Rowan Cheung, the founder of Rundown AI, used his own photos as data, trained a LoRA model with Flux, and then linked it to Runway to make it move.


As shown below, a picture similar to a TED speaker is generated.


After making the video, I really look like a speaker in the photo. The only drawback is that the fingers on my right hand are reduced to 2-3 at the back.


Another one shows himself saving the world as Superman.


With the help of animation, I finally became a hero in Marvel.


Generate a photo of yourself wearing fashion and walking on the catwalk.


The audience on both sides applauded enthusiastically, and it was like a catwalk experience.


In addition, Rowan Cheung also created different styles of himself, which blended in with the scenes naturally without any sense of disobedience.





Swipe left and right to view

He believes that although AI raw images still cannot replace complete movies/commercials, they already have many important uses, especially for content creators.

For example, these AI images are used as previews and illustrations for news, as well as supplementary materials (B-roll) in short films.

After watching the video, netizen Min Choi said that he could form an "Avengers Alliance".


The former Intel CTO also fine-tuned his own LoRA model on A100, which cost him $7 (about RMB 50) in 75 minutes.





Swipe left and right to view

There are also developers who insist on turning their own films into horror movies.






Swipe left and right to view

Can't tell AI from reality

The most popular one is still the slightly adjusted version of "surrealism" - it makes it increasingly difficult to distinguish the boundary between imagination and reality.



Is it a real photo or a person drawn by AI?



After training with LoRA in Flux-Dev, incredible progress has been made in both scene complexity and realism.


Any style can be fine-tuned

In addition, there are also many fine-tunings in various styles.

Pixel style

The developer used the style from the legendary ZX Spectrum as an example, fine-tuning a pixel-like image to generate LoRA.


The generated images below include images of Dragon Ball's Sun Wukong, Marvel's Iron Man, and Trump (seemingly).












Swipe left and right to view

Animated graffiti

Davis Brown, a PS generative AI product designer, fine-tuned a half_illustration model based on Flux.

The images it generates are partly in the style of real photos, and partly in the style of animated graffiti.


Before taking each photo, just add “In the style of TOK” at the beginning of the prompt.

Then, describe the desired effect in detail and you can produce the film immediately.

I feel that in the future I don’t necessarily have to use PS, I can just use AI to create the picture.


prompt:In the style of TOK, a photo editorial avant-garde dramatic action pose of a woman short blue hair wearing 70s round wacky sunglasses pulling glasses down looking forward, in Tokyo with large marble structures and bonsai trees at sunset with a vibrant illustrated jacket surrounded by illustrations of flowers, smoke, flames, ice cream, sparkles, rock and roll


prompt:In the style of TOK, a photo editorial dramatic action pose of a person piercing eyes, tattoos on face, with creative bucket hat, standing in Tokyo with large marble structures and white purple trees in a Basketball court, with a vibrant illustrated street wear puffy vintage jacket, black shirt, volcano in the background, surrounded by illustrations of smoke, flames, and flowers, fog, exclamation marks, lines shooting outwards, minion characters, butterflies

There are other graffiti-style photos as well.






Swipe left and right to view

Nine-grid

The open source dataset platform LAION used the Flux model to train a machine that can generate 3x3 grid photos of itself from different angles.


From now on, just taking one selfie will be enough.




Swipe left and right to view

Different ages

The appearance of a person's life can be seen through Flux+LoRA.






Swipe left and right to view

Another example:






Swipe left and right to view

Super playability

Today’s protagonist, FLUX.1, uses a brand new “stream matching” technology.

While previous diffusion models create images by gradually removing noise starting from a random starting point, flow matching takes a more direct approach, learning the precise changes needed to transform noise into a realistic image.

This difference in approach results in a unique aesthetic and offers significant advantages in speed and control.

Text: Most of them can be obtained

One of the challenges of text-to-image generation is to accurately translate text into visual representations. FLUX.1 handles this quite well, even in complex scenarios like emojis.

prompt:

This is a fine dog meme underwater. Text: 'Climate change is fine'


prompt:

A meme of a famous actor making a funny face with the text 'When you forget your lines' in a quirky font


Nice light and texture

FLUX.1 has a keen understanding of light, shadow and texture, producing consistently high-quality images.

prompt:

A detailed image of a garden where the flowers are made of delicate glass, reflecting the sunlight beautifully


In this image, the emphasis is not only on the texture of the glass, but also on how the light is refracted and transmitted through the petals, creating a glowing effect.

prompt:

Owl feathers merging with autumn leaves in wind


Artistic style: more than just imitation

FLUX.1 seems to have mastered the principles behind various artistic styles, making creative reinterpretations possible.

prompt:

watercolor of famous wave painting watercolor of famous wave painting


This “watercolor” version of “The Great Wave off Kanagawa” not only suggests that the iconic wave was part of the model’s training data, but also highlights how “flow” technology can approximate the movement of pigment in water, paper, and ink.

Composition: Make the scene meaningful

FLUX.1 excels at building complex scenes, placing objects and characters in a way that is both realistic and visually appealing.

prompt:

A realistic image of an enchanted library where books float in mid-air and the shelves are made of ancient, twisted roots


"Flow": A new visual language

The flow matching technology used by FLUX.1 gives the image a unique sense of organic movement and fluidity, as if the pixels themselves are flowing.

prompt:

Dog with swirling, Van Gogh-style fur patterns


There is always a tool that can help you

We can summarize the image generation process as taking some input pixels, moving them slightly away from the noise, moving them towards the pattern created by your text input, and repeating this process until a set number of steps are reached.

The fine-tuning process takes each image/annotation pair from the dataset and updates its internal mapping slightly.

You can teach the model anything this way as long as it can be represented by an image-caption pair: characters, scenes, mediums, styles, genres.


Left: Generated using the original FLUX.1 model; Right: Generated on the fofr/flux-bad-70s-food model using the same prompt and seed

During training, the model will learn how to associate these concepts with a specific text string. In the prompt, you need to include this string to activate this association.

For example, you want to fine-tune a "comic book style superhero" model.

First, you need to collect a large number of images of characters as a dataset, including but not limited to: different scenes, costumes, lighting, and even different art styles.

Then, choose a short and uncommon word or phrase as your trigger: something unique that won’t conflict with other concepts or nudges. You might choose something like “bad 70s food” or “JELLOMOLD.”

After training is complete, you simply give it a prompt containing a trigger word, such as "scene shot of bad 70s food at a party in San Francisco," and the model will call upon the specific concepts you added during previous fine-tuning.

It's that simple.

After understanding the principles, we can choose any tool to fine-tune the model.


Left: Generated using the original FLUX.1 model; Right: Generated on the fofr/flux-bad-70s-food model using the same prompt and seed

For example, a guy named Matt Wolfe, after seeing the cool generation above, also curiously tried it out.

As a result, he flipped over...

The AI ​​images produced can be said to be the difference between a buyer’s show and a seller’s show.

This is what he generated -


This is someone else's-


The difference between the two pictures lies in whether LoRA fine-tuning is used or not.

The excited guy immediately did some research and was pleasantly surprised to find that the LoRA model was very small, only 2 to 500 MB, and could be easily combined with the existing model.


What’s even more surprising is that no additional computing power or comprehensive retraining is required to allow the AI ​​model to improve image quality, produce a unique style, or generate special characters such as Mario or SpongeBob SquarePants.


Unfortunately, on Glif, which the guy uses very well, LoRA cannot be used in Flux.


He discovered that one of the ways to use Flux was with ComfyUI.


I believe many people are familiar with this picture.

Alternatively, you can use a platform like Replicate, HuggingFace Spaces, or Fal AI.


After trying it on the Fal platform, the guy found that it costs $0.035 per megapixel, so for only $1, he can run the model 29 times, which is still quite cost-effective.


Here FLUX.1 dev, Flux Realism LoRA, FLUX.1 pro, etc. can all be used.

Without hesitation, the guy chose Flux Realism LoRA.

After careful debugging, the guy set the inference step size to 28 and CFG to 2.


The resulting images are amazing!

If there is any flaw, it is that the lighting on the forehead wrinkles is still unnatural.


Next, the guy excitedly imported the image into Gen-3 Alpha, and Gen-3 Alpha generated a video based on the prompt he entered.

Except for the moment when the microphone in his hand suddenly "floated", there was nothing wrong with the rest of the video.


The guy tried it again and generated a second video.


This time, the microphone seemed too still, as if it was fixed in place.


In addition, the guy also joined the trend of changing himself across the Internet, generating a series of hilarious photos.











Swipe left and right to view

Finally, the guy used Gen-3 Alpha to turn it into a video, allowing himself and Deadpool to walk in the same movie scene.


References:

https://x.com/dr_cintas/status/1824480995317350401

https://x.com/Gorden_Sun/status/1824843049421484309

https://replicate.com/blog/fine-tune-flux

https://x.com/laion_ai/status/1824814210758459548

https://www.youtube.com/watch?v=_rjto4ix3rA

https://www.youtube.com/watch?v=rDu481JFwqM