2024-08-18
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Author: Xuushan, Editor: Manman Zhou
“
Many people expect it to be the next Midjourney.
”
This may be the most executive-capable AI startup in history.
Just 15 days after its establishment, AI startup Black Forest Labs has secured $32 million in seed round financing and released the FLUX.1 series of models for the large-scale Wenshengtu AI model.
Not only that, even Musk's newly released AI model Grok-2 was able to quickly launch the Wenshengtu function with its support, attracting millions of netizens to participate in the interaction.
And unlike the image generation capabilities of other AI models, the images generated on Grok-2 have almost no restrictions and are quite realistic.
Whether you want Steve Jobs to play with a cat, or Zuckerberg and Elon Musk to meet in an offline "octagonal cage", Grok-2 can satisfy your wishes. It can be seen that the model has strong performance in semantic understanding, alignment, and image generation (except security).
What is the origin of this company? How did it attract so many netizens to the point that even Musk was willing to choose it to power his core products? After an in-depth investigation, Silicon Rabbit finally unveiled the mystery of Black Forest Labs.
01
The opportunity for the establishment of Black Forest Labs started with another AI unicorn company, Stability AI.
In fact, the current 15-member startup team of Black Forest Labs all came from Stability AI. It can be said that the establishment of Black Forest Labs was a collective exodus of employees.
Robin Rombach, founder of Black Forest Labs, was a former research scientist at Stability AI and one of the two core pillars of Stability AI.
He studied physics at the University of Heidelberg and started his PhD in the computer vision group at the university in 2020. Robin has been focusing on deep learning models, especially in the field of vinculograms, and then joined the University of Munich in 2021 with the research group.
During his time at Stability AI, he led the development of the Wenshengtu AI modelStable DiffusionAt the beginning, Stable Diffusion could be regarded as the leader in the field of AI graphics, causing a stir in the industry. Stability AI's valuation also broke through $1 billion, entering the ranks of AI unicorns in one fell swoop, and it was in the limelight.
However, the development of Stability AI took a sharp turn for the worse in 2024. According to reports, Stability AI's annual cost expenditure was about $99 million, but its revenue was only $11 million, with a serious imbalance between income and expenditure. Subsequently, Stability AI's former CEO Emad Mostaque took at least 19 executives away from the company in March this year.
Robin Rombach also began to look for a way out. Black Forest Labs was a new beginning for him, and also a new starting point for many former employees of Stability AI. When Black Forest Labs was established, many Stability AI employees excitedly said, "We're live!"
Currently, there are three versions of the FLUX.1 series models, both open source and closed source. Among them, FLUX.1 [pro] is the most powerful closed source version, designed for professional applications that pursue top performance; FLUX.1 [dev] is an open source AI model that provides more efficient services in image quality and prompt words, but is not for commercial use; FLUX.1 [schnell] is an open source version designed for local development and personal use. It is the fastest of the three versions and requires the least memory.
All three models have been released in trial version on Replicate and Models.HuggingfaceThe number of downloads on the platform has exceeded 200,000, and the number of downloads of FLUX.1 [schnell] has exceeded 580,000, with 380 million experiences.
Registration experience link: FLUX.1 [schnell]: https://replicate.com/black-forest-labs/flux-schnell
02
Although the FLUX.1 series models are created by the original team of Stable Diffusion, this does not mean that they are a copy of Stable Diffusion.
The media compared Flux, SD3 Medium, Auraflow and Midjourney together, and we can see that the current better text-generated image models generate different photos for the same text prompt.
First up is prompt one: "Hand-drawn illustration of a giant spider chasing a woman through the jungle, extremely horrific, distressing, dark and creepy scene with a scary, suggestive atmosphere."
As you can see, Flux makes good use of lighting and shadows to create a sense of horror. The spider design is indeed scary, the spider legs are sharp, and the spider's face is also realistic. Auraflow's cyan tone does not achieve the dark and scary effect, and the overall picture is stylized. SD3 Medium's black and white style gives a strong sketch-like feeling. The spider design is detailed and scary, but there is some inconsistency in the characterization.
The second test mainly examines the image generators' ability to understand space. The text prompt is: "A dog stands on top of a TV with the word 'decryption' displayed on the screen. On the left is a woman in a suit holding a coin in her hand, and on the right is a robot standing on a first aid kit. The overall scene is surreal."
The image generated by Flux is closest to the description, and it puts all the elements in the required positions. The overall composition is balanced, and the design of each element and the retro-futuristic style meet the requirements of surrealism. But it also has some shortcomings, such as the character has an extra hand. SD3 Medium ranks second, and the overall design also meets the requirements of the text description, but there are still some shortcomings in the direction of accuracy, such as the cartoon-style dog should be standing instead of sitting. Auraflow has some gaps in the accuracy of text understanding and the quality of the pictures presented.
Prompt three reads "A high-resolution photo of a busy city street at night. Neon lights illuminate the scene, people walk along the sidewalk, cars drive by, street vendors sell hot dogs, and the lights are reflected on the wet road. The overall style is surreal, with attention to detail and lighting. The neon sign reads 'Decrypted'." This prompt mainly checks the understanding of realism of the major image generators.
The Flux produces a detailed, well-lit image that does a good job of depicting a busy street, with key signs clearly visible and pedestrians distinct. The SD3 also manages to show a balanced composition, realistic lighting, and well-integrated elements, but the depiction of pedestrians is a little thin.
Finally, foreign media Decrypt also conducted two tests on Flux and Midjourney, and finally judged Flux to be stronger.
The first caption reads: "A black and white photo of a woman with long straight hair sitting on the floor in front of a modern sofa, wearing an all-black outfit that accentuates her curves. She poses confidently for the camera, revealing her long legs as she squats. The background is minimalist, highlighting her elegant posture against the stark contrast of the light grey wall and dark outfit. Her expression exudes confidence and sophistication. Shot by Peter Lindbergh using a Hasselblad X2D 105mm lens at an aperture setting of f/4. ISO 63. Professional color grading enhances the visual appeal."
Decrypt believes that Flux captures the requirements of the prompt with natural poses, contextual backgrounds, and detailed rendering. It is the most accurate in terms of morphology. Midjourney shows vivid pictures and rich details in pictures, but lacks the image layering like Flux, and the body posture is not as accurate as Flux.
The second caption reads "A white cat playing piano, wearing sunglasses and a hat, and purple Hawaiian-style clothing, full body shot against a gray studio background, for commercial use."
Decrypt believes that Flux meets the requirements of full-body shots, gray studio backgrounds, and designated clothing, and the composition is professional and exquisite, which fully meets the prompt requirements. Midjourney provides close-up shots, and the image is expressive, but it does not meet the requirements of full-body shots and studio backgrounds.
It can be seen that Flux has taken the lead in the industry in terms of photo details, space and stylization understanding. It can compete with Midjourney and is even stronger than Midjourney in some aspects.
03
The field of AI literary images can be said to be the currentGenerative AIOne of the most competitive tracks in the field. Currently, Google, Meta, and OpenAI are all eyeing this field. The capabilities demonstrated by FLUX.1 have made many people expect it to become the next Midjourney.
But the key to becoming the next Midjourney lies in commercialization.
Midjourney, a pioneer in the same field, has a basic plan costing $96 per year, which can generate about 200 images per month, equivalent to 25 images per dollar. Ideogram's basic plan costs $84 per year, which can generate up to 400 images per month, or 50 images per dollar.
Black Forest has partnered with Fal AI, the developer of the open source model Auraflow, to support cloud generation. These models can also be tested for free on Replicate.com. Once the user reaches the daily free quota, they can choose to spend $1 to generate 33 images using the Flux Pro model or $1 to generate 333 images using Flux Schell.
Compared with Midjourney and Ideogram, Black Forest provides users with more choices. But this does not mean that Black Forest has been commercially successful. The cost of maintaining a generative AI model is very high. Take Stability AI as an example. According to Forbes, Stability AI spends about $8 million per month on costs and wages, but its revenue is only $1.2 million, which is far less than the cost. Today, commercialization has also become a "bottleneck" link for Ideogram and Pika Labs AI.
Therefore, if Black Forest wants to truly surpass Midjourney, how it balances income and expenses will be the key to its dominance in the field of large-scale AI models for natural graphics.
04
Black Forest Labs and Musk seem to agree on creating an "anti-awakening AI chatbot" and both do not want to impose too many restrictions on AI.
The "anti-woke AI chatbot" here refers to an AI chatbot that deliberately avoids adopting certain politically correct or socially awakened views, and it does not filter when facing controversial topics. Grok is obviously the carrier of Musk's concept of "anti-woke AI chatbot".
In terms of security evaluation, although Grok mentioned its six "bans", including content restrictions, copyright, and image processing complexity, in fact, judging from the generated photos, Grok has almost no taboos. All kinds of generated pictures of celebrities, pornography, and violence have begun to flood the social platform X.
Although many regulatory agencies have expressed dissatisfaction with social platform X, Musk seems to remain unconcerned. After the release of Grok-2, Musk also allowed users to post AI images generated by Grok directly on the platform without any AI-generated or Grok-generated watermark prompts.
Musk mentioned on the social platform X in 2022 that setting limits for AI would reduce the security of AI models. "Training AI is easy to wake up, in other words, the danger of (AI) lying is fatal." Some media speculated that it might be because the FLUX.1 series models did not impose too many restrictions that Musk chose to connect Grok to the FLUX.1 series models.
according toThe VergeAccording to multiple media reviews, Google's Imagen and OpenAI's DALL·E 3 AI models of the same type refused to generate the same "dangerous" prompt word, but Grok responded quickly and generated images quickly.
Just half a month ago, when Black Forest Labs was first established, it announced that the company's goal was to "enhance people's trust in the safety of these models." Half a month later, Black Forest Labs stood on the side of "no restrictions on AI" with Musk and opened the black box of the Wenshengtu AI model.
Faced with many controversies, Black Forest Labs now chooses to avoid talking about it and try to shift the focus of discussion to other directions. Its board member Anjney Midha criticized Google on the social platform X on August 14. GeminiWhen it was first released, there were hidden racial discrimination issues in the Wensheng pictures area, and it was stated that the FLUX.1 series models would not have such issues.
We can see that in terms of image processing capabilities, the FLUX.1 series models are indeed powerful and can compete with Midjourney. However, in terms of security, Black Forest Labs seems to have chosen a different path from other players in the same field.
Will the "no safety guardrail" policy make Black Forest Labs the absolute leader in the field of Wenshengtu? Or will it ruin the popularity of the FLUX.1 series models? We will have to wait and see.