news

Large models provide soil for deep fakes, and the industry calls for interdisciplinary collaboration to tackle counterfeit detection technology

2024-07-24

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

The development of counterfeit detection technology requires interdisciplinary cooperation. Current counterfeit detection technology is mainly based on software algorithms, but in the future it will move towards the integration of software and hardware.


The rise of big models has created fertile ground for deep fakes, and the industry is calling for interdisciplinary collaboration to tackle counterfeit detection technology.

In the era of big models, the boundary between artificial intelligence synthesized speech and real speech has become increasingly blurred, and it is urgent to improve the recognition technology that matches it. On July 23, the final of the 9th CreditEase Global Artificial Intelligence Algorithm Competition with the theme of deep speech forgery recognition was held in Shanghai. The competition encouraged participants to use deep learning and artificial intelligence adversarial technology to develop models that can accurately identify fake speech.

Deep fake is a method of using deep learning and artificial intelligence technology to generate highly realistic false content. The rise of large models has created a breeding ground for deep fakes. Just input the prompt words and the AI ​​system will output pictures, videos, and audio, making it difficult to distinguish between true and false.

Taking fake voice as an example, the large model can generate a variety of fake voices, which are more realistic, anthropomorphic, and smooth, bringing greater challenges to fake voice recognition. "In some high-value scenarios, AI-generated voice fraud often occurs. However, the development of voice authentication technology currently lags behind speech synthesis technology," said Chen Lei, vice president of CreditEase and head of big data and AI.

In the finals, contestants used different algorithm models and training ideas to identify fake voices, including the use of large-model-based recognition technology and traditional end-to-end recognition technology. End-to-end recognition technology has a smaller number of parameters and focuses on more vertical problems; large models have a larger number of parameters, higher data requirements, and strong generalization capabilities, and the recognition rate of fake voice data generated by large models has been significantly improved.

According to Lv Qiang, an algorithm scientist at CreditEase, the speech data set for the preliminary round mainly consisted of fake speech generated by traditional end-to-end TTS (text-to-speech), which was relatively easy to identify. For the first time, the semi-final data set included fake speech generated by the latest large model, transcribed fake speech, and samples composed of real and fake speech, covering more than five languages ​​including English, French, and Spanish, making the competition more difficult. "The addition of fake speech generated by the large model to the semi-finals made the competition more difficult, which also shows that the latest large model's ability to 'make the fake look real' has become stronger, which requires the corresponding deep fake recognition technology to keep up."

"We deliberately added some new scenario data to the competition, such as transcribed fake voices, which are data generated by multiple recordings of real voices. We believe this is fake voice." Lu Qiang said that for this scenario, the competition used real and fake voice slicing and mixing to construct adversarial data to avoid manual listening to voices and labeling to interfere with the competition. "As long as one slice is a fake voice, then the entire voice is fake, which is closer to the real scene, but the recognition challenge is great. If the transcribed voice problem and the real and fake confrontation can be solved, it will have academic value." Lu Qiang also said that multimodal information such as text and video can help in voice authentication, and large models and multimodality will be important development directions for voice authentication.

The counterfeiting technology and the counterfeiting detection technology are "competing", and the development of the two is spiraling. Chen Lei said that the research on the big voice model should abstract the application problems into academic problems, and after solving the academic problems, they should be engineered to solve the real needs of specific business scenarios. The development of counterfeiting detection technology requires interdisciplinary cooperation. The current counterfeiting detection technology is mainly based on software algorithms. In the future, it will move towards the integration of software and hardware, with the help of hardware tracing and sound collection, to play a role in the prevention and control of fake voice risks from the hardware level.

"There is no end to counterfeiting. As long as the generative road has not reached its end, counterfeiting will continue." Chen Lei said that after the competition, CreditEase will open source data for more extensive academic research, and desensitize the contestants' materials for sharing and learning. At the same time, it will absorb cutting-edge model ideas in business scenarios and build the AIGC counterfeiting detection platform. He believes that generative AI must comply with governance rules, and artificial intelligence governance requires top-level design by the regulatory layer to regulate and guide it. At the same time, he calls for ecological co-construction to prevent systemic risks through the industry.