three sheep’s “wealth recording” is fake, but the technology behind it is really scary

2024-09-29

if you want to say which company is in the limelight during this period, it is none other than three sheep. they are on the news every now and then, and they are in the legal column.

just a few days ago, the mooncake scandal hadn't passed yet, and yet another recording scandal broke out.

the thing is like this. on september 20, a recording suddenly circulated on the internet, which was suspected to be a recording of lu, a senior executive of three sheep.

the recording involves many female anchors from three sheeps. we won’t discuss the specific content here.but it was really explosive. . .

after the recording was released, it quickly caused a wave of enthusiasm on the internet. some people said that this recording was purely a man's bragging after drinking too much, but the topic of discussion soon turned to whether this thing was generated by ai, and even attracted many so-called ai experts. start an analysis.

within two days, the police report came out:let's all disperse, it's the ai that did it.

the other protagonist of this incident, reecho, finally surfaced and gave its users a hard blow.

interestingly, even though the authorities came forward, some netizens still felt that lu was "telling the truth while drunk" and that the report was just passing the blame to the ai, which had no way of self-certification.

but no matter what you say, the official investigation report has come out. whether you believe it or not, the tone of this matter has been set.

however, based on my understanding of ai voice, a situation like the three sheep recording gate is indeed possible. mainlythe current ai voice technology is indeed quite mature.

because we only need to upload one or two sentences and leave the rest directly to ai, and we can clone a person's timbre in minutes.

let’s put it this way, ai speech synthesis is more common now and there are many open source projects, just a few there are two major types of technologies: tts and svc&rvc.

the so-called tts, simply put, is text to speech, convert text into speech. like many ai digital humans, audio books, and video dubbing, everyone often hears "look at this man's name xiaoshuai" when using douyin, as well as those tvb female voices and guangxi cousins in the clip material library. . . it's basically all done by tts.

for example, reecho, which is involved in the three sheep recording gate this time, is also a tts model generation website. in fact, we have also cloned the voice of the seiyu dan dan of bad review jun on their website, so everyone can listen and see if it looks like it.

let me start with the "excerpts from the famous article" "spaghetti mixed with no. 42 concrete", don't say it, you really don't say it. the reproduction level of the sound can be 80%-90%, and the tone of voice is so similar that if you don’t listen carefully, you would think you are doing some serious science popularization.

news

three sheep’s “wealth recording” is fake, but the technology behind it is really scary

introduction

my contact information