news

OpenAI rolls out advanced speech mode to some ChatGPT Plus users

2024-08-01

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

New audio features allow users to talk to ChatGPT and receive real-time responses immediately, or interrupt ChatGPT while they are speaking. More advanced features such as video and screen sharing will be available later.

On July 31, OpenAI announced the launch of advanced speech mode to some ChatGPT Plus users, and plans to open it to all ChatGPT Plus users in the fall.

OpenAI is driving the development of a new generation of artificial intelligence voice assistants. New audio features will allow users to talk to ChatGPT and receive real-time responses immediately, or interrupt ChatGPT while it is speaking. Voice mode can distinguish the information conveyed by different voice tones. It sounds more natural and uses voice to convey a variety of different emotions.

The new speech mode is powered by OpenAI’s GPT-4o model, which combines speech, text, and vision. To gather feedback, OpenAI is initially rolling out speech advanced features to a “small group of users” of ChatGPT Plus, but OpenAI says it will be available to all ChatGPT Plus users this fall.

OpenAI postponed the launch of the voice call experience from late June to July, saying it needed time to reach release standards. OpenAI said it had tested GPT-4o's voice capabilities with more than 100 external red teams in 45 languages. The company said it had established several security mechanisms, such as working with voice actors to create four preset voices to protect privacy and prevent the model from being used to create deep fake voices. GPT-4o does not imitate or generate other people's voices.

Previously, when OpenAI first launched GPT-4o, it faced backlash for using a voice named "Sky," which sounded a lot like the voice of actress Scarlett Johansson. Scarlett Johansson had issued a statement saying that OpenAI had contacted her and wanted to allow the model to use her voice, but she refused. OpenAI denied that it was Johansson's voice, but has suspended the use of Sky.

OpenAI also said that it has adopted filters to identify and block requests to generate music or other copyrighted audio, and applied the same safety mechanism used in the text model to GPT-4o to prevent it from violating the law and generating harmful content. "We set up guardrails to block requests for violent or copyrighted content." In addition, more advanced features such as video and screen sharing will be launched later.