2024-08-19
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
August 19 news, HKUSTiFlytekAnnouncesparkThe voice model is updated and Spark is officially launchedExtreme SpeedovertakePersonificationThe system will be able to interact with the user and apply its capabilities to the "Xiaoxing Chat" function of the iFlytek Spark APP, which will be open to the public at the end of August.
Judging from the official demonstration results, Spark's ultra-fast super-anthropomorphic interaction has achieved breakthroughs in four aspects: response and interruption speed, emotional perception and resonance, voice-controlled expression, and character role-playing.
In terms of response speed, Spark's ultra-fast human-like interaction supports multiple rounds of interaction, and the response speed isGPT-4o is almost the same as the normal human chat rhythm. Users can interrupt and interject at any time during the conversation, and the system can respond quickly, achieving a truly seamless conversation experience.
In terms of emotional perception and emotional resonance, Spark's ultra-fast human-like interaction can recognize a user's emotions such as joy, anger, sadness, and happiness. It can not only judge based on the content of the sound, but also respond with appropriate emotions. In addition, the system can also recognize non-verbal sounds, such as coughs and pet calls, and give corresponding responses.
In terms of voice-controllable expression, compared with the previous situation where the machine voice could not be adjusted in voice interaction, now as long as the voice gives instructions, the super-anthropomorphic can be controlled to make changes in expression methods such as emotion, style, dialect, and intensity.
In terms of "role playing", it supports multiple role simulations, and can change roles according to user needs to enhance the fun and interactivity of the conversation.
It is reported that iFLYTEK uses a unified neural network to achieve end-to-end modeling of speech-to-speech, simplifying the process of traditional voice interaction speech-to-text, large model generation of responses, and speech synthesis, thereby greatly shortening the response time and improving the anthropomorphism and fluency of the interaction. In addition, through multi-dimensional speech attribute decoupling representation training, the system can more flexibly control elements such as content, timbre, and emotion to meet different scenarios and needs.
iFlytek said that Spark Speed Super Anthropomorphic Interaction will be fully available at the end of August, and plans to continue to expand interactive functions and modes in the future to provide users with richer and more practical intelligent voice services. With the continuous maturity of technology and the expansion of application scenarios, intelligent voice technology is expected to usher in explosive growth in many fields such as smartphones and smart cars. According to IDC's forecast, by 2030, the global intelligent voice service market will reach approximately US$73.16 billion, with a compound annual growth rate of 27%. (Xianxian)
This article is from NetEase Technology Report. For more information and in-depth content, follow us.