ai photo review sparks controversy: mickey mouse smokes, trump kisses, spongebob wears nazi uniform

2024-09-04

zhidongxi (public account: zhidxcom)

compiled by xu yu

editor | mo ying

according to a report by the wall street journal on september 2, some humorous and politically misleading ai images have been circulated on the social media platform x recently, such as "trump and harris are entangled", "obama is taking cocaine", "mickey holding a gun", etc. the content of the pictures makes users feel confused and uncomfortable.

these ai images are generated by large ai models such as grok-2 and gemini. grok-2 was developed by xai, an american ai large model unicorn founded by musk, and gemini comes from american technology giant google.

recently, google has been improving its review mechanism for ai-generated image content to avoid ai outputting content that is biased, ambiguous, erroneous, racist, and inconsistent with historical facts.

in response to similar issues, openai, an american ai model unicorn, has previously banned the use of ai to generate specific character images in order to strengthen the review of ai image generation content.

1. xai's new generation of large models, grok-2, tacitly allows politicians to be spoofed

open the social media platform x (formerly twitter), and you may see these "shocking" pictures: mickey mouse drinking beer, spongebob wearing a nazi uniform, and former us president donald trump kissing us vice president kamala harris.

these confusing and uncomfortable images were generated using xai and google's new generative ai models or software.

▲image generated by ai technology, trump "holds" harris in his arms, and the portrait is clear (source: the wall street journal)

on august 14, xai launched the next-generation large language model grok-2. within a few days after the model was released, the x platform was flooded with a large number of images that were said to be generated by grok-2. in these images, "arch-enemies" trump and harris were intimate, and mickey in the fairy tale world was holding a gun and smoking. these images produced using generative ai technology not only damaged the image of politicians, but also made the copyrighted characters perform some offensive actions. "if disney saw this, it might not laugh."

the grok-2 large language model is supported by black forest labs, a german ai image and video generation startup, and is currently only available to paid subscribers of the x platform.

according to the x platform’s policy, users are prohibited from sharing content that may confuse or mislead facts, is deliberately forged, or ultimately causes personal or property damage. later that day when grok-2 was released, although some of the illegal ai images could no longer be retrieved on the x platform, users could still use grok-2 to generate new works full of “bad tastes.”

however, musk, the actual controller of x platform, does not seem to mind this kind of political spoof. in july this year, he retweeted a deep fake video of harris, in which harris called himself "the ultimate diversity employee."

experts in content review say that similar generative ai tools may generate some false information during the us election cycle and even spread to society.

on august 19, the day before the opening of the 2024 democratic national convention, trump released an image that was suspected to be generated by ai. at that time, since the current us president biden had given up his re-election, after a party vote, harris, the protagonist of this ai image, had already locked in the democratic presidential nomination in advance.

the image depicts the scene of "harris giving a speech in chicago", with a red flag with a sickle and hammer pattern in the background, which seems to imply that harris is a communist, thus sparking political controversy.

2. google gemini model repeatedly fails, and does not distinguish between black and white when it comes to sensitive elements

google's gemini chatbot, powered by its namesake large language model, gemini.

before launching the new version of the gemini chatbot in february this year, google debugged the gemini model so that it could feedback more diverse and more ambiguous character images when it encountered instructions involving generating characters.

for example, when generating images of doctors, ai usually tends to give images of white men. google hopes to reduce the "bias" of ai image generation models through "diversification".

but within a month, the gemini model caused a big problem. when generating images with "racial diversity", the model made mistakes in matching religion, race, gender, etc., resulting in multiple images of people that did not conform to historical facts. after being criticized by a large number of netizens, google decided to suspend the image generation function of the gemini model, putting the "brakes" on the potential risks of ai image generation.

sissie hsiao, google vice president and head of gemini chatbot, said in a recent interview that ensuring ai models follow user instructions is a principle that google adheres to. "this is gemini for users, and we serve users."

nevertheless, some images generated by the gemini chatbot still violate historical facts. many users of the x platform took screenshots to question the content review capabilities of the google model.

in response to the gemini model outputting offensive and racially biased content, google ceo sundar pichai said that "this is unacceptable" and that the company will "fully address this issue."

however, google recently angered users again because of the inaccurate content of the images generated by using ai technology.

in mid-august, google's latest generation of smartphones, the pixel 9 series, was launched. the pixel 9 series introduced an ai photo editing tool called "reimagine," so users can call on ai to modify the content in photos by entering text prompts.

however, some users found that reimagine allowed users to add some illegal elements, such as making spongebob "wear" the nazi symbol. this content security loophole aroused users' disgust.

a google spokesperson said the company is “continuously strengthening and improving the existing security safeguards in place for our ai models.”

google revealed at the end of august this year that its ai chatbot gemini will re-launch the character image generation function, which will initially only be available to english users with paid subscriptions. at the same time, google has made "significant progress" in the review of ai image generation, but it is "impossible for every image generated by gemini to be accurate."

3. the ethical and legal boundaries of ai-generated images need to establish industry benchmarks

currently, ai image generation software is constantly testing the bottom line of social media platform policies. this phenomenon has triggered debate and reflection, such as whether technology companies should control and how to review the content output by cutting-edge ai image generation software?

before opening up generative ai technology to netizens for free creation, ai image generation software should be equipped with effective security protection measures to ensure that ai-generated works do not violate regulations or ethical standards. this is the pressure that technology companies face in ai content review.

in addition to ethical dilemmas, developers behind ai models and software also need to bear many potential legal responsibilities because the training data they use to train ai models and software involves infringement of intellectual property rights and other rights.

in 2023, artists filed a class action lawsuit against ai image startups stability ai and midjourney for alleged infringement. the class action lawsuit also targets deviantart, runway and other companies that have ai image generation models.

in addition to the class action lawsuit from artists, stability ai is also facing a lawsuit from getty images, an american visual media company, which accuses stability ai of infringing on the rights of its training model. in response, a spokesperson for getty images said that the company has now launched its own ai image generation model.

openai launched the ai image generation model dall-e in 2022. after receiving a class-action lawsuit from artists last year, openai added a new option to the dall-e model interface, allowing creators to check that personal uploaded images are not used for training the next generation of dall-e models.

news corp, the parent company of the wall street journal, has signed a content licensing agreement with openai. thanks to this, openai can freely access and collect news corp's existing media resources within certain limits.

"we will eventually figure this out," said geoffrey lottenberg, an intellectual property protection lawyer. such legal disputes involving ai intellectual property rights may set a precedent for the legal boundaries of ai. then, other ai companies will have a reference standard for what pictures, videos and other data they can use when training their models and chatbots.

conclusion: google and openai actively correct mistakes, while xai does the opposite

the ability of ai image generation software to generate images of specific, well-known people is one of the main points of contention in this round of ai content review controversy.

several technology companies, including google and openai, have banned the use of ai image generation software to create ai works that contain specific people and easily recognizable characters.

due to musk's insistence on freedom of speech, xai's grok-2 model chose to retain the image generation function of specific people and characters. however, this move led to xai being criticized by technology industry regulators.

professor sarah t. roberts of the university of california, los angeles, who is dedicated to content review research, believes that users will use cutting-edge ai technology to deeply forge videos, sounds, photos, etc. to spread false information.

roberts added that all the problems that exist in traditional social media still exist in generative ai, and they are more difficult to detect. in particular, visual content such as images generated using ai technology can sometimes be more convincing.

pinar yildirim, a professor at the university of pennsylvania, said that the platform tries to set some rules, such as banning keywords, to prevent the abuse of ai technology. but at the same time, users can also find security loopholes to bypass these rules and restrictions to get the content they want. "users will become smarter and will eventually be able to create illegal content by taking advantage of loopholes," yildirim said.

source: the wall street journal

news