news

OpenAI opens GPT-4o voice mode to some paying users, providing more natural real-time conversations

2024-07-31

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

IT Home reported on July 31 that on the 30th local time, OpenAI announced that it will open the GPT-4o voice mode (IT Home Note: Alpha version) to some ChatGPT Plus users from now on, and will gradually promote it to all ChatGPT Plus subscribers this fall.


In May this year, OpenAI Chief Technology Officer Mira Murati mentioned in a speech:

In GPT-4o, we train a new unified model end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model to combine all of these modalities, we are still in the early stages of exploring the model’s capabilities and its limitations.

OpenAI originally planned to invite a small number of ChatGPT Plus users to test the GPT-4o voice mode at the end of June this year, but the official announced a postponement in June, saying that it neededMore time for polishingThe model improves theDetecting and rejecting certain contentAbility.

According to previously disclosed information, the average voice feedback delay of the GPT-3.5 model is 2.8 seconds, while the delay of the GPT-4 model is 5.4 seconds, so it is not very good in voice communication. The upcoming GPT-4o can greatly shorten the delay time.Nearly seamless conversation

GPT-4o speech model hasQuick responseReal-life voiceIn addition to features such as speech recognition, OpenAI also claims that the GPT-4o speech model can perceive the emotional tones in speech, including sadness, excitement, or singing.

OpenAI spokesperson Lindsay McCallum said: “ChatGPT Do not impersonate someone else's voice, including the voices of individuals and public figures, and will preventDifferent from the preset soundOutput."