news

openai gift package: real-time speech and model distillation are all available, sell the best products and make the most money

2024-10-02

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

author|jessica

today, openai’s first devday in 2024 was held in a low-key manner at the gateway pavilion at pier 2 in san francisco.

different from last year’s turbulent official announcement, the news of this year’s developer day only appeared once in the x account of openai developers two months ago, so many people did not know about it.

participants are basically invited customers and selected front-line developers. the location was kept secret until the last moment, the mysterious schedule was not revealed until the day, and there was no live broadcast.

before the meeting, sam altman quickly tweeted:

“some new tools for developers are launching today!

from the last devday to this one: the cost per token from gpt-4 to 4o mini has dropped by 98%, the number of tokens in our system has increased by 50 times, model intelligence has made excellent progress, and there was a little drama in the process.

i'm looking forward to the progress from this devday to the next. the path to agi has never been clearer. "

as romain huet, the company's director of developer experience, previously informed on the x platform, this year's devday did not release any new models and only focused on api improvements.

and openai indeed packaged a series of real-time apis, prompt caching, model distillation, visual fine-tuning, playground optimization, as well as expanding the scope of o1 api and increasing the rate limit for developers on this veritable "developer day". important tool updates.

the pricing of these new apis is not absolutely cheap. many developers comment that the combination of pricing and capabilities makes them attractive. this sincere developer gift package is presented at once, and openai still wants to make the most money by selling the best products.

image source: @swyx | x.com

1

real-time api: build your own “her” app in one step

as the most eye-catching feature today, the realtime api allows developers to call gpt-4o-realtime-preview, the underlying model of chatgpt’s advanced voice mode, to build a fast and natural voice-to-voice conversation experience in applications. supports 6 preset voices to achieve low-latency voice interaction.

when creating voice assistants in the past, developers needed to rely on multiple models to complete different tasks: first using a speech recognition model like whisper to transcribe audio into text, then passing the text to a language model for inference, and finally generating speech through a text-to-speech model. output. not only is the process complicated, but emotion and accent can easily be lost, and there are significant delays.

the real-time api only requires one call to complete the entire conversation process. dramatically improve the naturalness and responsiveness of conversations by streaming audio input and output. it uses a persistent websocket connection to exchange messages with gpt-4o and supports function calls, which can quickly respond to requests, perform operations such as placing orders or providing personalized services. it can also automatically handle interruptions like advanced voice mode to ensure a smoother user experience. it is very suitable for customer support, language learning and other scenarios that require high interactivity.

at the scene, the staff demonstrated the voice assistant built through real-time api, helping more than 100 developers in the audience to "call and place an order for 400 strawberries for delivery".