2024-09-29
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
the three sheep incident caused quite a stir. unexpectedly, the victim turned out to be ai.
the night before yesterday, the hefei police issued a notice on the "recording incident of lu wenqing, founder of three sheep group", saying that the widely circulated audio was generated by ai and the suspect had been subject to criminal coercive measures in accordance with the law.
with a final word, this notification not only gave an official position, but also slapped the "first person in domestic ai" who was rumored on the internet a few days ago. after all, the judgment given by the "first person in domestic ai" at that time was "ai voice cloning technology is not yet that silky smooth.”
but what is even more surprising is that an ai company jumped out to "issue a statement" yesterday, saying that the audio content was produced by the suspect through a self-developed ai dubbing large model.
netizens were also shocked. after all this, they still didn’t forget to advertise. is ai the king of rolls? we followed the company mentioned in the statement to find the source of the statement and found relevant content on a weibo with the same name. however, the account has not been officially certified, so we cannot make a final conclusion.
however, the discussion around this statement is still increasing. netizens have called it "suicide marketing". some curious babies have asked whether the ai company's voice cloning product is really that powerful. "has anyone tried this?" let’s test the website.”
just give it a try... on the basis of hiding the names of the relevant companies and products, we conducted some actual tests on the product. it should be noted that the following tests are only for popular science purposes. the value of the tool lies in how users use it. , we will never support anyone using ai to test the boundaries of the law.
at the same time, we have also consulted relevant lawyers to know whether there are any precedents for this kind of ai cloning voice infringement cases, and what legal issues creators and platforms need to pay attention to when using or promoting new technologies, for your reference.
ai clones a person’s voice,
just a few seconds of sound sample
enter text, assign roles, automatically segment text sentence by sentence, and generate it with one click.
after entering the product page, we went through the above steps and it only took 1 minute to get jiang wen to read out the lines of liu zi in "let the bullets fly".
dad, i have searched everything, but there is no money, no goods, and no silver. there are only two people left alive, should we kill them or not?
with this cadence and tone, i don't know if i thought the role of liu zi was played by jiang wen. in fact, liu zi played the son in the movie, and jiang wen played the role of liu zi's father.
this audio was generated using the voice character "jiang wen" in the product.
currently, there are many voice characters in this product, including well-known internet celebrities such as "sun xiaochuan" and "ding zhen", as well as superstars in the cultural and sports circles such as "kobe bryant" and "jay chou".
these voice characters are all uploaded by community users. clicking on the official characters on the platform will display "coming soon, so stay tuned."
in addition to using voice characters uploaded by community users, it is also easy to clone a celebrity's voice on the platform.
here we have uploaded a real interview recording of musk, in which the ai musk "personally" said "you swan, he frog! (the toad wants to eat the swan meat)", a chinglish phrase that is very popular abroad.
the platform requires that the sample voice only needs to be longer than 2 seconds, and the quality of the sample is more important than the length, so when performing voice cloning, the most time-consuming step is to find a clear recording of musk.
according to officials, this recording will be used to define the character's default vocal performance, including voice, emotion, speaking speed, intonation, rhythm, etc. if you want different voice styles for the same character, you can also add different style samples of the voice character.
at present, we have only uploaded an audio piece in this version, and we are still using the platform’s fast cloning mode instead of the paid professional cloning mode (officially said that the mode’s timbre and emotional restoration degree is as high as 99.9%). the performance of short sentences is better. it's already 6 to 7 points similar to musk's own voice.
from the perspective of content form, generative ai has "invaded" text, audio, video and even 3d content. among them, audio can be said to be one of the more mature tracks for technology application.
ai sound cloning is only a subdivision of ai audio generation. other applications include ai-generated music and ai-generated sound effects.
long before the advent of generative ai, ai voice cloning actually existed. at that time, i wanted to clone voices, which was based on traditional tts (text-to-speech, text-to-speech) technology. it required building an ai voice library and collecting a large number of human voice specimens to create a database. later, it had to be simulated through manual debugging. human voice.
or based on open source projects such as bert vits, the latest deep learning speech synthesis technology can be used to directly convert text to speech to restore the timbre, but the equipment and technical requirements are relatively high.
image source: gpt-sovits tutorial of station b’s up master “henji weizi”
nowadays, under the wave of aigc, the "rolled" ai tools only require 10 seconds or less of sound samples to accurately reproduce the sound.
earlier, we introduced the principle of ai voice cloning technology in a live broadcast, which is generally divided into steps such as voice collection, feature extraction, model training and speech synthesis. related product tools include fish audio, cosyvoice, elevenlabs, cutting, etc., allowing the threshold for voice cloning operations has become lower. (for related live broadcast replays, you can follow the "ai new list" video account or scan the qr code of the image below to view)
therefore, it is technically feasible for the “three sheep recording gate” to be produced by ai. especially in the hands of "careful people", in addition to ai generation, manual debugging, post-editing and other methods can also be used to achieve fake and real effects.
not to mention that there are a lot of complex environmental noises and the speaker’s “drunken state” settings in the circulating recordings, which greatly increase the difficulty of identifying the authenticity of the recordings. it is not surprising that many netizens will speculate that ai just acts as a "temporary worker" and can resist everything.
in fact, this also reflects from the side that with the rapid iteration of ai technology, there is an information gap between us ordinary people and front-line practitioners on what ai can do and to what extent it can do it.
in addition, the "three sheep recording gate" incident also exposed legal issues such as the lack of platform supervision and improper use by creators.
discussion on ai voice infringement on content platforms
in fact, this is not the first case of audio infringement through ai forgery.
in april this year, the beijing internet court heard the country’s first “ai voice infringement case.”
the plaintiff yin moumou is a dubbing artist and has recorded many audio works. he accidentally discovered that his voice had been transformed into ai and sold on an app called "magic sound workshop". the court ultimately ruled that the defendant's use of the plaintiff's voice without the plaintiff's permission constituted infringement, and compensated the plaintiff 250,000 yuan for various losses.
according to article 1023 of the civil code of the people's republic of china, the voice of a natural person is protected by law, and its protection method is similar to the right of portrait. this means that if the sound generated by ai is identifiable and can be associated by the public with a specific natural person, then using the sound without the permission of that natural person may constitute infringement.
li yunkai, the plaintiff in china’s first ai painting copyright case and a partner at beijing tianyuan law firm, told the “ai new list”:
at present, our laws do not need to be revised. because ai technology is still in the process of development, new technologies may be iterated in two years. if our laws are to be legislated for this, the legislation will take about 3 to 5 years. by then the technology form has changed, then this law is actually it became a piece of paper.
our current laws have already provided for the basic framework. what needs to be adjusted is how to interpret these laws and how to shape relevant judicial attitudes through typical cases. only when the technology is truly mature should we promote legislation to clarify the rules that have been established in judicial practice.
in addition to infringement cases in judicial practice, ai voice infringement on content platforms is more extensive and secretive.
at present, the endless emergence of ai tools has greatly lowered the threshold for creation, and aigc has become a popular content production method after pgc and ugc.
it is very common to use ai voice cloning technology to re-create popular music, let anime and game characters perform ai covers, or let deceased celebrities speak, etc. on domestic and foreign content platforms.
compared with the clichéd fan creation, using ai for secondary creation is a broader concept. fan works are usually limited to creations within fan groups, while secondary creations may come from ordinary ai technology enthusiasts, and with the blessing of ai technology, there is greater room for imagination for adaptation and innovation.
ai sound second-generation works with high quality and quantity can not only reach the fan base by leveraging the popularity of the original ip or celebrities themselves, but also have the potential to break through the circle.
generally speaking, considering the current quantity and influence of second-generation content and the ecology of content platforms, copyright protection mainly relies on the consciousness of creators, copyright owners and public supervision.
if the original work and the copyright holder do not pursue claims against the second-generation work, there will generally be no legal issues.
most content platforms also choose to allow these contents to grow freely while imposing certain restrictions. after all, overly strict copyright supervision will inevitably dampen the enthusiasm of creators and hinder the spread of content, which will also be a huge loss for content platforms.
of course, while encouraging content innovation, content platforms also need to improve corresponding review, labeling and supervision mechanisms.
on september 14, 2024, the cyberspace administration of china released the "measures for labeling of synthetic content generated by artificial intelligence (draft for comments)", which further clarified the specific requirements for adding aigc content labels.
those who provide editing services that generate speech such as synthesized human voices or imitated voices or significantly change personal identity characteristics should add voice prompts or audio rhythm prompts and other signs at the beginning, end or middle of the audio at appropriate positions, or add them to the interactive scene interface. prominent warning signs.
in addition to the ambiguous and difficult-to-determine issue of copyright ownership, another controversy comes from realistic ethical and moral conflicts.
for example, using ai to "resurrect" the voices and smiles of deceased celebrities, under the guise of warmth and remembrance, is also regarded as disrespectful and excessive consumption of the deceased.
whether it is the second-generation ai voice works of the content platform or the criminal incident of ai dubbing by three sheep, there are still many copyright, ethics, data privacy, illegal and criminal issues surrounding ai voice cloning technology that need to be further discussed.
author | tsukiyama tachibana ishize
editor | zhang jie