news

openai fully releases the human-like chatgpt voice assistant, which can speak 50 languages ​​including chinese

2024-09-25

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

author: li dan

source: hard ai

four months after openai’s initial public release, chatgpt’s advanced human-like artificial intelligence (ai) voice assistant capabilities are finally becoming available to a wide range of paying customers.

on tuesday, september 24, eastern time, openai announced that all users who pay to subscribe to openai chatgpt plus and team plans will have access to the new chatgpt advanced voice mode advanced voice, which will be gradually rolled out in the next few days and will be launched first in the us market. next week, the feature will be open to subscribers of openai edu and enterprise plans.

this means that this week, chatgpt plus version personal users and teams version small business team users can enable the new voice function, just by speaking, without manually entering the prompt word and gpt dialogue. when accessing the advanced voice mode on the app, users can know that they have entered the advanced voice assistant through a pop-up window, and users will receive a notification from the app.

openai has given the new voice version of chatgpt two functions: one is the function of storing "custom instructions" for the voice assistant, and the other is the "memory" function that remembers what kind of behavior the user wants the voice assistant to perform, similar to the memory function that openai launched for the text version of chatgpt in april this year. users can use these functions to ensure the personalization of voice patterns and let the ai ​​assistant respond according to the user's preferences for all conversations.

openai launched five new voices of different styles on tuesday, named arbor, maple, sol, spruce and vale, adding to the four voices of breeze, juniper, cove and ember launched in the old version of the voice model, bringing the number of optional voices to nine. openai also improved the conversation speed, fluency and accent of some foreign languages.

openai introduced that the advanced voice assistant can say "sorry, i'm late" in 50 languages, and attached a video in a social media post to demonstrate that a user can ask the voice assistant to apologize to the grandmother for keeping her waiting for a long time. the video shows that the ai ​​assistant first summarizes what the user wants to express in english as required, and then, after the user reminds the ai ​​grandmother that she can only speak mandarin, the ai ​​assistant repeats it in standard mandarin.

the new voice features are available for openai’s ai model gpt-4o, not for the recently released preview model o1.

the launch of the new voice function is a long time coming. wall street journal mentioned that openai demonstrated the voice mode voice mode when it launched its new flagship model gpt-4o in may this year. at that time, the chatgpt voice supported by gpt-4o sounded like an american adult woman and could respond to requests instantly. when it heard mark chen, the director of openai research, exhale excessively during the demonstration, it seemed to sense his nervousness and said, "mark, you are not a vacuum cleaner," telling chen to relax and breathe.

openai originally planned to launch the voice mode to a small group of plus plan users at the end of june, but announced in june that the release would be postponed for one month to ensure that the feature can safely and effectively handle requests from millions of users. at that time, openai said that it plans to make the feature accessible to all plus users this fall, and the exact timetable depends on whether the internal high standards for security and reliability are met.

at the end of july, openai launched chatgpt in advanced voice mode to a limited number of paid plus users, saying that the voice mode cannot imitate the way others speak, and added new filters to ensure that the software can detect and reject certain requests to generate music or other forms of copyrighted audio. however, the new voice mode lacks many features that openai demonstrated in may, such as computer vision. this feature allows gpt to provide voice feedback on the user's dance moves using only the camera of a smartphone.