2024-09-25
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
there is no need to wait until fall. early this morning, openai announced that the new advanced voice mode of chatgpt will be pushed to all plus and team users within this week.
openai ceo sam altman posted on the x platform:
advanced voice mode is officially launched today! (will be fully pushed this week) i hope you think it's worth the wait🥺🫶
however, the daily usage time of the new advanced voice mode is also limited, and this limit will also change. when the user has 15 minutes of usage time left, the system will issue a reminder.
this means that it may not be realistic to want to have it as an ai confidant that you can talk to 24 hours a day.
there are two easy ways to determine if you are eligible for push notifications.
first, if you are eligible for push notifications, the following notification will appear in the app when you first access voice mode:
the second method is to judge by examining the number of voice styles.
previously, chatgpt supported five preset voice styles, but it was later removed from the shelves due to the legal dispute between the "sky" voice style and "black widow".
today, openai launched five new voice styles that sound more natural - vale, spruce, arbor, maple, and sol.
the openai website describes these nine voices as follows:
arbor - easygoing and versatile
breeze - lively and serious
cove - calm and straightforward
ember - confident and optimistic
juniper - open and optimistic
maple - cheerful and outspoken
sol - smart and relaxed
spruce - calm and confident
vale - smart and curious
after listening to the new voice styles, netizens have different preferences. some miss "sky", while others are already immersed in the new voice style. sol is currently the most popular. which voice do you prefer? feel free to share in the comments.
so how effective is chatgpt’s advanced voice mode?
the official provided an example. when you want to sincerely apologize to your grandmother who only speaks mandarin for being late, you can use chatgpt, which speaks more than 50 languages, to achieve this goal.
you heard it right. chatgpt spoke the following mandarin in a clear and fluent manner:
grandma, i'm sorry i'm late. i didn't mean to keep you waiting so long. how can i make it up to you?
in addition, chatgpt's advanced voice mode now supports setting custom commands.
openai employees pointed out that vocal modality (the way sound is delivered) includes many elements that may not be reflected in text conversations, such as accent, intonation, rhythm, etc.
users can now customize the way an ai model speaks through settings to more accurately describe how they want the model to speak.
she gave the example of asking the model to speak at a specific rhythm, enunciate clearly, speak slowly, and use the user's name regularly. she suggested starting with something simple, such as letting the model know the user's name and some basic information.
in the specific scenario, the user asked what fun things to do on the weekend. chatgpt’s advanced speech model provided some suggestions based on the weather and the user’s location (the bay area), such as hiking, picnics, or driving along highway 1.
or, if she expressed interest in taking a scenic drive and asked which route she should take, chatgpt could come up with a thoughtful plan.
in short, by customizing the model’s voice and interaction style, chatgpt’s advanced voice model can provide specific suggestions based on the user’s preferences and needs, making the interaction more natural and useful.
in addition, the conversation speed, fluency and accent of the new version of chatgpt advanced voice mode have been greatly improved, so it may be worth considering having it as your foreign language partner.
when talking about the user experience, openai model designer drew said that chatgpt will remain quiet when he is doing something or does not need to talk to chatgpt.
when he has a question, he asks it, and that question can stretch into a long conversation.
during the conversation, chatgpt's voice will adjust according to the tone of the conversation. in his opinion, chatgpt is like a friend sitting next to him, not only providing information but also exchanging ideas.
in practice, you can also try to use it to practice interviews and other scenarios without worrying about delays.
“i mean, the latency is so low, it’s like talking to another person,” drew stressed.
the first batch of user experience can be reviewed in appso’s previous articles👇
the first batch of users of gpt-4o voice mode are here! the movie "her" finally came true, netizens: i almost fell in love with her
it is worth noting that the new advanced speech model launched by openai is not yet open to use in regions such as the european union, the united kingdom, switzerland, iceland, norway and liechtenstein.
one stone stirs up a thousand ripples, and the netizens affected are both angry and helpless.
unfortunately, video and screen sharing on chatgpt are still not available.
this feature was unveiled at a press event four months ago, when openai showed us how to ask chatgpt a math problem on a piece of paper in front of it or a code on a computer screen in real time.
at this time, openai has not yet specified when this feature will be launched.
in the ai industry, where development is often measured in days, the belated advanced voice mode is essentially a castrated product.
there were no new eye-catching features, and even the features promised at the may launch event were not fulfilled. the full push that had been preheated seemed more like a targeted attack on google's new model.
paradoxically, the meaning of “coming soon” in openai’s lexicon seems to be different from ours.
some long-promised features could mean tomorrow, or next year.
but if you think about it from another perspective, openai, with its strong technical capabilities, is also a company that sells imagination. what we are most looking forward to is perhaps their next pit.
after all, this has become a tradition for them, hasn't it?
One more thing
openai's official website updated a version of the qa answers about chatgpt voice mode today. we have also briefly summarized some practical answers and hope they will be helpful to you.
1. when using advanced voice mode, you can still keep the conversation in the background of the phone.
2. if you switch from text or standard voice mode to advanced voice mode, please note that you will not be able to return to the previous text or standard voice conversation state.
3. when using the in-car bluetooth or hands-free phone function in the car, the experience of advanced voice conversations may be affected, as openai currently does not provide dedicated optimizations for these devices.
3. advanced voice conversations are not yet available to gpts. you can only have standard voice conversations with gpts. gpts have their own unique voice option called shimmer.
4. in order to respect the copyright of music creators, openai has taken a number of safety measures, including new filtering conditions, to prevent voice conversations from generating musical content, including singing.
5. advanced voice mode conversations are inherently multimodal, and the transcribed text is not always exactly the same as the original conversation.
6. the audio in the advanced voice conversation will be retained with the chat history until you actively delete it. after deletion, the audio will usually be deleted within 30 days, but it may be retained longer in certain specific cases.
7. openai says that by default, the system will not use your audio in voice chat to train the model unless you choose to share it.
8. if the "improve voice chat experience for all users" option is not turned on in "settings", it means that you are not sharing your audio and the system will not use your audio to train the model.