news

OpenAI suddenly updated! GPT-4o launched an advanced voice version, answering questions in seconds, netizens went crazy

2024-07-31

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


Zhidongxi (public account:zhidxcom
authorvanilla
editLi Shuiqing

GPT-4o’s advanced voice capabilities are finally no longer “futures”!

According to Zhidongxi on July 31, early this morning,OpenAIAnnouncing the rollout to a small group of ChatGPT Plus usersAdvanced Voice Mode, based on GPT-4o to provide more natural real-time conversations.


▲OpenAI launches advanced speech mode

After the mode was launched, many netizens who received the invitation have already started playing it and shared their videos and feelings about the trial. For example, this is a fast-talking rap and beatbox presented by ChatGPT, which sounds pretty good.

//oss.zhidx.com/uploads/2024/07/66a9902a60e1d_66a9902a5d0a5_66a9902a5d078_Beatbox.mp4

In general, ChatGPT's advanced voice mode is not much different from the official demonstration.Almost no delay, and the various tones are also very vivid. However, it seems that it has taken many security measures. ChatGPTThe probability of rejecting user requests has increased

ChatGPT’s voice conversation feature was first introduced in September last year. In May this year, OpenAI launched aFlagship model GPT-4oA more advanced version of voice dialogue was developed and publicly demonstrated. GPT-4o uses a single multimodal model instead of three separate models to implement voice functions, thereby reducing the latency of conversations with chatbots. (OpenAI defeated voice assistants overnight! GPT-4o model is so powerful, ChatGPT learned to read the screen, the real Her is coming)

At the time, OpenAI announced that the feature would be rolled out to free and paid users in a few weeks. But within a few days of its release, OpenAI was accused by Scarlett Johansson, who plays Black Widow in the Avengers series of movies and is known as "Black Widow" by fans, and was strongly opposed by netizens because the voice of ChatGPT in the demo was too similar to that of Scarlett Johansson.

The release date of the advanced speech model was delayed as a result, and the voice was later removed, although OpenAI insisted that ChatGPT did not imitate Scarlett Johansson's voice.

1. Testing by more than 100 external red team members, to be open to all subscribers in the fall

The advanced speech mode based on GPT-4o is currently available only to a small number of ChatGPT Plus users.More natural real-time conversationsAllow users to interrupt at any time, and canSense and respond to user emotions

Users who participate in this Alpha test will receive an email with instructions and receive a notification in their ChatGPT mobile app. OpenAI said that it will continue to add more users in a rolling manner and plans to make it available to every Plus subscriber in the fall.


▲Invitation email and App homepage

ChatGPT's advanced voice mode was released in May this year. It is based on OpenAI's new flagship model GPT-4o and can conduct voice chat and real-time video interaction, such as understanding linear equations through video images, and understanding and judging people's emotions through their expressions and intonations.

OpenAI said that since its initial release, the team has been working to enhance the security and quality of voice conversations, testing voice capabilities with more than 100 external red team members in 45 languages.

To protect privacy, OpenAI only used four preset voices to speak when training the model, built corresponding systems to mask different outputs, and took protective measures to block requests for violent or copyrighted content.

OpenAI plans to share a detailed report on GPT-4o’s capabilities, limitations, and safety assessments in early August.

2. The first wave of trial users started to do some activities: practicing French, learning to meow, and commentating on football

The first wave of trial users can’t wait to use the advanced voice mode and have shared their trial experiences.

Artist Manuel Sainsily turns on his camera whileReal-time shootingHis newly adopted kitten and the environment he set up for it, while asking ChatGPT for advice on feeding.

//oss.zhidx.com/uploads/2024/07/66a9900fc37cb_66a9900fbde19_66a9900fbddf7_Video Dialogue.mp4

ChatGPT responded without any delay. He first praised the cat's cuteness in a very doting tone, and then comforted Sainsily after asking for more information, telling him not to worry. Sainsily said with emotion: "It feels like a video call with a knowledgeable friend."

Netizen Bergara shared on the social platform Reddit that ChatGPT rejected all his requests to sing and was unwilling to change his voice. When he asked ChatGPT to recite a poem in different ways and emotions, it succeeded, but when he asked it to recite the poem with a smile, it refused.

For example, Bergara said he was practicing French and using ChatGPT as aLanguage Coach, asking for their opinion on the pronunciation.

//oss.zhidx.com/uploads/2024/07/66a9903094c84_66a99030913bd_66a990309139a_French teaching.mp4

In response to Bergara's pronunciation of words, ChatGPT gave detailed suggestions on stress, ending sounds, etc., and demonstrated them. At the same time, its teaching style is very "encouraging education", and it praises Bergara's pronunciation generously, with the emotional value directly maximized.

Bergara also asked ChaGPT to useShy, angry toneTell jokes about beer. ChatGPT's definition of shyness is to speak breathily and to raise the volume of one's voice when expressing anger.

//oss.zhidx.com/uploads/2024/07/66a990398daca_66a9903989c33_66a9903989c08_tell jokes in a shy and angry tone.mp4

When ChatGPT is requiredSad toneWhen reciting poetry, it sounds like it is about to break…

//oss.zhidx.com/uploads/2024/07/66a9902fc3720_66a9902fbc252_66a9902fbc230_sad tone.mp4

Bergara said that in tests so far, ChatGPT has performed similarly to what OpenAI has shown, butThe rejection rate seems a bit high, he speculated that it might be for security reasons.

For example, when Bergara asked ChatGPT to sing a story about robots and love, it said it could tell the story, but only in a normal speaking tone.

//oss.zhidx.com/uploads/2024/07/66a99036460bb_66a9903642127_66a99036420ff_emotionally rich storytelling.mp4

As ChatGPT told its story, Bergara interrupted it several times and asked it to “add more emotion.” ChatGPT did so, and its tone became slower and more animated.

Some netizens have already started using ChatGPT to make things happen.

Squad co-founder and CTO Ethan Sutin lets ChatGPTImitating various cat callsI have to say that this cat's cry is a bit "magical", but it seems quite real because my cat was attracted to it...

//oss.zhidx.com/uploads/2024/07/66a9901c00939_66a9901bf0c77_66a9901bf0c51_Learn to Meow.mp4

ChatGPT also seems to haveMusic PerformanceSutin asked him to play a C minor chord. Are there any readers who understand music theory who can listen to see if it is correct?

//oss.zhidx.com/uploads/2024/07/66a9903dcfec1_66a9903dcbf91_66a9903dcbf62_chord.mp4

Netizen Cristiano Giardina asked ChatGPT to playFootball match commentatorHe shared some initial impressions of trying out the advanced speech mode: it’s very fast, always produces interesting results, and always has an American accent when speaking other languages.

//oss.zhidx.com/uploads/2024/07/66a9988d2ea93_66a9988d279ea_66a9988d279c4_足球解说.mp4

Netizen Kesku asked ChatGPT to say aThe non-existent language, and then explains how the language works. ChatGPT created a sound-based language called Glimnar that sounds a bit like chanting.

//oss.zhidx.com/uploads/2024/07/66a998835c09b_66a9988357da7_66a9988357d83_Creating Language.mp4

Although there are only a few users using ChatGPT's advanced voice mode now, as its push scope expands, perhaps we will see more interesting gameplay and experiences.

Conclusion: OpenAI increases its focus on AI safety

AI in voice and video is under scrutiny for its ability to serve as a tool for fraud. Although OpenAI's voice model does not currently allow the generation of new voices or voice clones, the model could still confuse people.

In the months following its spring update, OpenAI released a series of new papers on safety and AI model alignment. Prior to this, its super alignment team had been disbanded, and some former and current employees criticized it for shifting its focus to releasing new products instead of safety. For now, the slowdown in the rollout of advanced speech models seems to be a signal to users, regulators and lawmakers that OpenAI is serious about safety.

The release of ChatGPT’s advanced speech model also further differentiates OpenAI from competitors such as Meta’s Llama 3.1 model and Anthropic’s Claude 3, putting pressure on AI startups focused on emotional speech.