news

AI beauties are all defeated! They all turn into "ghosts" under the cyber magic mirror, and AI code destroys AI raw photos

2024-08-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

  • Mingmin Cressy from Aofei Temple
    Quantum Bit | Public Account QbitAI

Under the cyber magic mirror, all AI beauties turn into ghosts.

Look at its teeth



If you increase the image saturation to the maximum, the teeth of the AI ​​portrait will become very strange and the boundaries will be blurred.

The color of the overall picture is normal, but the microphone part is even weirder.

contrastReal human photos, it should be like this.

The teeth are clear and the color blocks in the picture are uniform.



This tool is now open and anyone can try it out with their photos.

Even a single frame in an AI-generated video cannot escape this rule.



Photos that don't show teeth can also reveal problems.



But BTW, this toolFrom Claude. Use AI to crack AI, a wonderful closed loop.



To be honest, AI portraits that are too realistic have recently sparked quite a bit of discussion. For example, in a set of popular "TED speaker videos", none of them are actually real people.



Not only are faces difficult to distinguish, even AI's previous shortcoming - writing - can now be completely indistinguishable from the real thing.



More importantly, the cost of generating such AI portraits is not high. It can be done at a price as low as 1.5 US dollars (about 10 RMB) every 20 seconds for 5 minutes.



Now netizens can’t sit still anymore and have started an AI anti-counterfeiting competition.

Nearly 5,000 people came to discuss which of the two pictures was of a real person.



The reasons given vary. Some people find the details of the words and patterns too abstract, while others think the eyes of the characters are too empty...

The rules that the most advanced AIs use to generate portraits have gradually been figured out.

It's hard to tell without looking at the details

In summary, adjusting saturation may be the fastest way to identify.

AI group portraits are more thoroughly exposed using this method.



However, there is a problem. If the image is compressed using the JPEG algorithm, this method may not work.

For example, make sure that this photo is a real person photo.



However, due to image compression and lighting issues, the character's teeth are a bit blurry.



Therefore, netizens have also listed more methods to determine whether a portrait is AI-synthesized.

The first method, simply put, is to rely on human knowledge and judgment.

Since the way AI learns images is different from that of humans, it is inevitable that it cannot 100% grasp the visual information from the human perspective.

As a result, AI-generated images often containInconsistent with the real worldThis provides a starting point for image identification.

Take the image at the beginning as an example.

Overall, the character's skin is too smooth and no pores can be seen. This overly perfect feature actually adds to the sense of unreality.

Of course, this "unreality" is not completely equivalent to "fake". After all, the pores cannot be seen in the pictures that have been processed with skin smoothing.

But this is not the only factor for judgment. The discrepancies between AI and common sense in the pictures may not be limited to one.



In fact, if you look at the following details in this picture, you can see a more obvious feature:The hook above the badge has a unique connection method



There are also microphone flaws in high saturation mode, which can be seen with the naked eye after zooming in.



What is even more subtle is that the position of several hairs at the ends of the hair is unreasonable, but such a feature probably requires vision at the level of Leeuwenhoek to be seen.

However, with the advancement of generation technology, it is an inevitable trend that the features that can be found are becoming more and more hidden.



Another method is to look at the text. Although AI is gradually overcoming the problem of "ghostly drawing" in the depiction of fonts, there are still some difficulties in correctly rendering text with correct actual meaning.

For example, some netizens discovered that on the badge worn by the person in the photo, the two letters in the last line of text below the Google logo are "CA", which stands for California, USA, and the long string in front should be the name of the city.

But in fact, there is no city in California with such a long name.



In addition to the details of the objects themselves, information such as light and shadow can also be used to determine authenticity.

This picture was extracted from a video, and there is another frame like this in the video.

On the right side of the microphone, there is a very strange shadow, which corresponds to one of the character's hands. It is obvious that the AI ​​processing here is lacking.



When it comes to video, AI is more likely to reveal its flaws than in static images because of the consistency of the content.



There are also some features that are not considered "common sense errors", but they also reflect some preferences of AI when generating images.

For example, these four pictures are all of “average people” synthesized by AI. Have you noticed anything in common?



Some netizens said that none of the people in these four pictures are smiling, which seems to reflect some characteristics of AI-generated pictures.



This is indeed the case for these pictures, but it is difficult to form a system with this judgment method. After all, different AI drawing tools have different characteristics.

In short, in order to cope with the gradual advancement of AI, on the one hand, we can increase the intensity of "Leeuwenhoek", and on the other hand, we can introduce image processing techniques such as increasing saturation.

However, if such "quantitative changes" accumulate more and more, it will become increasingly difficult to judge with the naked eye, and the image saturation may one day be broken by AI.

So people are also changing their thinking and coming up with the "model-by-model" approach, using AI-generated images to train detection models and analyze more features from images.

For example, AI-generated images have many characteristics in terms of spectrum and noise distribution. These characteristics cannot be captured by the naked eye, but AI can see them very clearly.

Of course, it cannot be ruled out that the detection methods are outdated and cannot keep up with the changes in the model, or even that the model developers specifically carry out adversarial development.

For example, for the picture we have been discussing in the previous article, a certain AI detection tool believes that the probability that it is AI synthesis is only 2%.



But the game between AI fraud and AI detection is itself a "cat and mouse game."

Therefore, in addition to detection, model developers may also need to take some responsibilities, such as adding invisible watermarks to AI-generated images to make AI fraud nowhere to hide.

AI is a powerful tool

It is worth mentioning that many of the AI ​​portraits that have caused panic are from the recently popularFluxProduce/participate in production.

People have even begun to assume that effects that are too good to be distinguished are produced by Flux.



It was created by the original team of Stable Diffusion and caused a sensation on the Internet just 10 days after its release.

These beautiful fake TED speech photos are all from it.



Some people have used Flux and Gen-3 together to create beautiful skin care product advertisements.



And various synthesis effects from multiple angles.



It solves the problems of AI painters and AI generating text in pictures.



This has directly led to the fact that humans can no longer distinguish AI drawings by looking directly at the hands and text, but can only guess by looking at clues.



Flux should have strengthened training on indicators such as hands and text.

This also means that if the current AI continues to work hard on training in texture details, color, etc., when the next generation of AI drawing models comes out, human recognition methods may become ineffective again...

Moreover, Flux is open source and can be run on a laptop. Many people have now forgotten Midjourney.

It took 2 years to go from Stable Diffusion to Flux.

It took 1 year to go from "Will Smith eating noodles" to "Tedx speaker".

I really don’t know what tricks humans will come up with in order to distinguish AI generation in the future...

Reference Links:
[1]https://x.com/ChuckBaggett/status/1822686462044754160
[2]https://www.reddit.com/r/artificial/comments/1epjlbl/average_looking_people/
[3]https://www.reddit.com/r/ChatGPT/comments/1epeshq/these_are_all_ai/
[4]https://x.com/levelsio/status/1822751995012268062