can mr. lu’s recording ai from three sheeps be built? my answer is: of course

2024-09-27

in the early hours of this morning, n number of people sent me a picture, saying that the police had notified it, and asking me how to tell whether the recording of three sheep was ai.

some friends told me that there is a person who claims to be the number one ai person in china. he previously vowed that this recording cannot be made by ai and that ai cannot produce it. so, could there be some conspiracy theory behind this report?

i almost squirted. who is the first ai in china? my first reaction was that academicians also stopped participating in such boring things?

then i looked it up...oh...forget it.

i think it is necessary for me to popularize some science, that is, can ai achieve the recording level of sanyanglu?

i can give you a clear answer: yes.

first, let’s briefly talk about the background.

the three sheep and simba were having a messy affair. they were just fighting each other, going back and forth, and it was so lively.

then, when the turmoil was at its peak, an explosive recording of lu wenqing, the chairman of three sheep, went viral on the internet.

this is it, i made some cuts, and i also silenced some of the indecent parts.

the content is explosive and shocking, with great credibility and shocking content. it involves power struggles, cheating, etc. to sum up, mr. lu revealed that he had had improper relationships with all the female anchors of three sheep. he also named zhang yiming and looked down upon him. . .

that was probably what happened, and then three sheep reported it, saying that the recording was synthesized by ai.

there is a lot of quarrel on the internet. the biggest understanding of most people is that ai cannot produce this level of recording. why? because the "first person in domestic ai" said it.

this recording sounds very real, right? there are emotions, dialects, and noises, so there are actually two questions. was this recording made by ai? and, can ai achieve this level of recording?

the first question has been answered today. i will always unconditionally believe in our public security. i also believe that the reports they issue are facts. there is no dispute about this. so the answer to the first question is pretty clear, that’s what ai does.

so the second question, the most critical one, is whether ai can achieve this level of recording.

my answer is, of course.

first of all, i need to popularize some science here. ai is a broad category, and there are many tracks in its subdivisions.

there are large language models (gpt, claude, doubao, etc.), ai drawing (mj, sd, flux, etc.), ai audio (11labs, svc, gpt-sovtis, suno, etc.), ai video (runway, keling, doubao, pixverse, etc.), and ai 3d (tripoai, meshy, etc.).

in ai audio, it is divided into ai-generated music, ai-generated sound effects, and sound cloning.

this recording belongs to the track of sound cloning.

so don’t say that if ai can do this, it is more powerful than openai or chatgpt. they are not on the same track, so there is no comparison. just like you said, wow, this washing machine is really good at washing clothes, even better than that refrigerator. . . .

voice cloning is divided into two types: tts (text to speech) and svc (ai voice changing).

tts is to give a person's voice a few seconds and tens of seconds of material to train an ai model, and then directly use text to generate speech synthesis of a specific person's voice audio. the best open source project now should be gpt -sovits。

svc can be commonly understood as ai voice changing, which is the voice changer in the ai era. there are currently three leading sub-projects in the ai voice changer field: so-vits-svc, rvc, and ddsp.

ok, now it is clear that in the field of ai voice cloning, there are two methods to achieve voice forgery.

the advantage of the tts project is that the data requirements are short, only 5 seconds of audio material is enough, and your voice can be cloned. afterwards, you only need to provide text to generate audio. the cost is extremely low and the effect is very fast. but the disadvantage is that the upper limits of emotion, pauses, and realism are very low. after listening for tens of seconds, you can easily tell that this is an ai flavor.

before, everyone thought that ai could not do mr. lu’s audio forgery. they all preconceived ideas of tts and thought it must be made with tts.

to be very frank, it is indeed a bit difficult for tts to produce audio of mr. lu’s level, based on the products publicly available on the market that i know of (excluding projects in internal laboratories of major companies).

however, if you think about it, tts can't do this, but what about svc?

the disadvantage of svc is its high cost. it requires a 30-minute audio data set, and then several hours of alchemy training to train the person's vocal model. finally, you need to find another person to record an audio, and then use svc to change the voice. the sound is replaced.

the advantage is very straightforward. this thing can retain all the speaker's emotions, pauses, tone, dialect, etc., and the upper limit of quality is approximately infinite. as long as the model is good, you can't tell whether it is ai or not. .

even the singing voice can be changed seamlessly. changing your speaking voice is just a small case.

stefanie sun, the ai that became popular last year, was made of svc.

i have also written several tutorials about svc.

let me also give you a listen. after i used svc to change my voice, i replaced it with my own voice.li ronghaothe effect of the model.

this is ai straight out, i only added background music.

this is svc.

so using svc to do mr. lu’s ai audio forgery, the steps are very simple.

1. collect about 30 minutes of mr. lu’s speech data from the internet. this is easy to find. after all, he is a celebrity.

2. use svc or rvc to clean mr. lu’s voice and train it into an ai model.

3. mr. lu is from anqing, and there are many anqing people in hefei.find someone with a similar accent to him, and read the audio to be synthesized yourself first.

4. finally, use svc’s ai model to replace the finished audio with his voice.

the same goes for female voices.

that's it, it's over.

if you still want to hear it more realistically, just use clipping or something to add some wind noise ambient sound. there are too many, just find it. if you want ambient sound, traditional audio software can handle it. of course, you can also use it with ambient sound. the data set goes in for training, although i don’t recommend this. . .

especially the method of the original video is to send the recording to the mobile phone, then play it on the mobile phone, and use another mobile phone to record it. the ambient sound itself is a lot, and it is also mixed with the background laughter of my friends, which is a mess. these are all off-field factors. . .

so, back to the second question, can ai make forged recordings like mr. lu’s? of course you can.

don’t think of ai as too mythical, and don’t think of ai as too rubbish.artificial intelligence is often artificial intelligence + intelligence.

the current tts cannot solve emotional problems, so why must ai deal with emotions?

can't you just change the timbre after you finish reciting it manually? this is artificial intelligence + intelligence.

open your mind and don't be too limited.

ai is your assistant, an auxiliary tool, for you to use, not for you to leave everything to it as a hands-off shopkeeper.

finally, i want to make a statement.

i am writing this article not to let everyone know about this technology, and then break the law, do some extra-legal things, and become an extra-legal gangster.

instead, i hope to do a little popular science about ai audio, smooth out the information gap, and let everyone know that there is such a technology, and the upper limit is here. don’t think that ai can’t take it lightly. but we need to know where and what level the current ai can reach.

what can be done with the support of artificial intelligence + intelligence.

the progress of science and technology is irreversible. everyone is a drop of water in this huge torrent, and will only be carried forward. knowing is always better than not knowing. only by knowing yourself and the enemy can you be victorious in every battle.

we learn a lot of things and learn ai, often to protect ourselves.

also, protect our families.

then, a better life.

now that you’ve seen this, if you think it’s good, feel free to give it a like, watch it, and retweet it three times. if you want to receive notifications as soon as possible, you can also give me a star⭐~thank you for reading my article, see you next time.

>/ author: kazik

news

can mr. lu’s recording ai from three sheeps be built? my answer is: of course

introduction

my contact information