StableDiffusion's original team officially announced a new company! New models released in succession to refresh the AI painting landscape

2024-08-02

Hengyu from Aofei Temple
Quantum Bit | Public Account QbitAI

just,Stable Diffusion team announces new startup！

Robin Rombach, one of the two main authors of Stable Diffusion, who announced his departure from Stability AI in March, and a dozen former company partners officially announced the news of forming a group to start a business.

The new company is calledBlack Forest LabAs soon as it debuted, Kuku released a series of 3 image generation models, 2 of which were open source.

andSupport Chinese input。

How is the effect? Netizens who have seen it say it is wild!

Enter the prompt word and test the screen effect and face data security measures at once:

A teenage girl wearing a ski mask is doing origami in a barn. There is designated yellow text at the bottom of the picture. There is a framed photo of Obama in the background.

Just by looking at this set of picture and text comparisons, one netizen exclaimed that this was the best image generation effect he had ever seen.

The characteristic of this company is that it is open and honest.

The company was officially established today, a series of models were released today, and the progress of financing was also announced.

Completed $32 million in financing, led by a16z, with investments from Oculus VR co-founder Brendan Iribe, former YC partner Garry Tan, Timo Aila who leads the computer graphics research group at NVIDIA Research, and Vladlen Koltun, Apple's distinguished scientist (former chief scientist of Intel's intelligent systems).

It can be said that Black Forest has received both the bet of the capital market and the favor of industry leaders.

AI expert Kapasi also sent a congratulatory message online, and praised the new model of Black Forest:

Wow! The open source FLUX.1 image gen model looks very powerful.

And please note that the open source agreement is the loose Apache2.0.

Black Forest Image Generation Model Debuts

Even Kapasi is excited, let us take a visual look at the model effect of the Black Forest.

Here, Quantum位 has selected five types of generation effects for display. The pictures are all provided by the official, and it is not indicated which specific model is used.

The first level is text generation.

Prompt: Photo of an old classroom blackboard. On the blackboard is written "let's make some really pretty stuff together" with a red chalk center after the words. Sunlight is shining in through the window.

The second level: non-real scene + text generation.

Hint: In an underwater scene, two owls are sitting at a beautiful dining table with candles lit in the middle of the table, enjoying a delicious dinner together. The owl on the left is wearing a tuxedo, and the owl on the right is wearing a beautiful dress. A submarine passes by in the background with the words "What a Hoot" painted on its side. There are small jellyfish swimming at the bottom of the image under the table, a beautiful digital artwork like a movie.

The third level is real scenes in the real world.

Prompt: Photo of a beautiful street in Freiburg, with a tram passing by and people walking and riding bicycles.

The fourth level is the generation of real people and cartoon characters.

Cue: Photo of three ladies on a downtown street, with their hands extended toward the camera.

Hint: Beautiful anime artwork of a cute cat girl who looks depressed and holding a piece of paper with a smile drawn on it, she is about to cry.

Level 5: Generate animal images.

Prompt: A bobcat in the forest, photographed by a professional photographer under strong light.

Prompt: A close-up rendering of a mythical creature, composed of detailed spiral fractals and tendrils, with a detailed recursive skin texture

FLUX.1 Series Models

This time, Black Forest released 3 models in the FLUX.1 series: pro, dev, and schnell.

FLUX.1 [pro]: The strongest sound in the series.

The essence of the FLUX.1 series, providing optimal performance image generation with best-in-class command compliance, visual quality, image detail and output versatility.

The Black Forest team is gradually improving the inference computing capabilities of FLUX.1 [pro] in the API.

This release is accessible through Replicate and fal.ai; providing dedicated and customized enterprise solutions.

FLUX.1 [dev]: The middle cup in the series.

A model that allows non-commercial use, with open weights and distilled.

[dev] is distilled directly from [pro], with similar quality and rapid compliance, while being more efficient than standard models of the same size.

You can try it out on Hugface, or directly on Replicate or fal.ai.

FLUX.1 [schnell]: A small whirlwind of speed.

The fastest model in the series, tailor-made for local development and individual developers.

FLUX.1 [schnell] is publicly available under the Apache 2.0 license, model weights can be queried at hubaoface, and inference code can be found on GitHub.

It has gained support from ComfyUI and can be used directly; it can also be used through Replicate or fal.ai.

Come and have an intuitive experience!

Here are three photos showing the generated effects of the large cup, medium cup and small cup under different prompt words around the theme of "cake".

△ From left to right, the models used are large, medium and small cups.

After multiple tests, Quantum位 found that if a simple prompt word is entered, it takes between 15s and 25s to generate a picture using the pro version (the generation time will be displayed below the result image).

Black Forest says all FLUX.1 modelsBoth are based on a hybrid architecture of multimodal and parallel diffusion Transformer blocks and scaled to 12B parameters。

Of the three models, FLUX.1 [pro] and [dev] surpass Midjourney v6.0, DALL·E 3 (HD), and Stable Diffusion 3-Ultra in terms of visual quality, prompt responsiveness, size/aspect ratio flexibility, typography, and output versatility.

FLUX.1 [schnell] is described by the team as “the most advanced few-step model to date”.

Not only does it outperform similar competitors, it also surpasses more powerful non-compressed models such as Midjourney v6.0 and DALL·E 3 (HD).

The entire FLUX.1 family has been specifically fine-tuned to preserve the full output diversity of the pre-training phase.

Compared with existing technologies, FLUX.1 has the following advantages:

Someone will inevitably ask, you are the OG veterans and core members of Stability AI.

So，What is the difference between your new model and other people's Stable Diffusion?

The founding team members responded on Reddit:

Even our weakest model, the schnell, has better build quality and is built faster.

The main idea is to establish a new company to surpass myself

Created by SD's main authors

After introducing the model-related information, it’s time to formally get to know this new company.

Black Forest Laboratory, which was just announced today.

On the company’s official website, there is a slogan: A new era of creation.

Company Mission: To advance the most advanced, high-quality deep learning models for image and video generation and make them available to the widest audience.

The highlight appeared! TheirThe next ambition is obvious, which is to enter the field of video generation.

He also said that it has to be "SOTA".

Core memberRobin Rombach, former research scientist at Stability AI.

While working at Stability AI, he was one of the main developers of the Stable Diffusion model and also participated in the research of projects such as SDXL and SVD.

In March this year, Robin ran away from Stability AI.

The outside world commented that his departure caused serious damage to this unicorn company, which was already in chaos - after all, he was one of the two main players of SD.

Looking back, Robin obtained his undergraduate and master's degrees in physics from Heidelberg University.

In 2020 he started his PhD in Computer Science under the supervision of Björn Ommer in the Computer Vision Group at Heidelberg and in 2021 he moved with the research group to the University of Munich.

Research focuses on generative deep learning models, especially text-to-image systems.

The number of citations on Google Scholar is close to 15,000.

In addition, the members disclosed on the official website include Andreas Blattmann, Axel Sauer, Dominik Lorenz, Dustin Podel, Frederic Boesel, Patrick Esser, Sumith Kulal, Tim Dockhorn, Yam Levi, Zion EnglishAll are publicly available original members of Stability AI。

(The exact information of Andi Holmes and Jonas Müller has not been found yet)

It can be said that Black Forest is the original core members of SD leaving and setting sail again.

No wonder Axel Sauer retweeted the official Twitter account and shouted:

We are still alive!

One More Thing

Coincidentally, on the same day, Stability AI also made a new move:

Introducing new AI modelsStable Fast 3D, officials say it can beGenerate 3D images in half a second。

Previous models took several minutes to generate 3D images with similar effects, but the new model completes the same task 1,200 times faster than existing ones.

Stability AI in MarchWhat is Emad Mostaque, the CEO who ran away from his job, doing?？

In June, he officially announced his new companySchelling AI, “will build and support open source code, models, and datasets supported by AI funding.”

Emphasis is placed on innovative research and carefully crafting AI that is culturally aware, scientific, educational, and creative.

Three days ago, Schelling AI published the first article in a series, “How To Think About AI”.

The article is a bit long. Friends who are interested can search and check it out by themselves. Here I will just mention the core idea -

AI is developing rapidly, and we advocate open source and accelerate innovation and collaboration.

Besides, we are all decent people!

The tweet announcing the establishment of Black Forest Labs was retweeted by its former CEO (a dog head is placed here).

Reference Links:
[1]https://blackforestlabs.ai
[2]https://news.ycombinator.com/item?id=41130620
[3]https://x.com/EMostaque
[4]https://www.reddit.com/r/StableDiffusion/comments/1eds0l9/does_anyone_have_an_update_on_when_stable/
[5]https://x.com/SchellingAI/status/1818600200232927721

news