news

openai developer conference gives away gift packs: significantly reducing model costs, ai voice-enabled apps, and small models "improving" the performance of large models

2024-10-02

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

author of this article: li dan

on tuesday, october 1st, eastern time, openai held its annual developer conference devday. this year's conference did not have any major product releases. it was more low-key than last year's conference, but openai also distributed several big "gift packages" to developers. ”, making improvements to existing artificial intelligence (ai) tools and api suites.

this openai devday launches a series of new tools, including four major innovations: prompt caching, vision fine-tuning, realtime api, and model distillation. it brings good news to developers in terms of cost, improving the visual understanding level of models, improving voice ai functions and small model performance.

some comments said that the focus of this year's devday is to improve the capabilities of developers and showcase the stories of the developer circle, which shows that as competition in the ai ​​field becomes increasingly fierce, openai's strategy has changed. the above new tools highlight openai's strategic focus on strengthening its developer ecosystem rather than competing directly in end-user applications.

some media mentioned that at the press conference before the devday event, openai's chief product officer kevin weil talked about the recent departures of openai's chief technology officer mira murati and chief research officer bob mcgrew, saying that their departure would not affect the company's development, "we do not it will slow down.”

prompt caching can reduce input token costs by up to 50%

prompt word caching is regarded as the most important update released this devday. this feature is designed to reduce developer costs and reduce latency.

the prompt word caching system introduced by openai automatically provides a 50% discount on the input tokens recently processed by the model, which may lead to significant savings for applications that frequently reuse context. such significant cost reductions provide enterprises and startups with significant opportunities to explore new applications that were previously out of reach due to prohibitive costs.

olivier godement, openai platform product manager, said that gpt-3 was a great success two years ago, and now openai has reduced related costs by nearly 1,000 times. he couldn't cite any other example where costs had been reduced by the same amount in two years.

the following openai chart shows that prompt word caching can significantly reduce the cost of applying ai models. compared with non-cached tokens of various gdp models, the cost of caching input tokens can be reduced by up to 50%.

vision fine-tuning: the new frontier of visual ai

openai devday announced that openai’s latest large language model (llm) gpt-4o introduces visual fine-tuning. this feature allows developers to customize the visual understanding of their models with images and text.

this is a major update known as the new frontier of visual ai. it could have far-reaching impacts in areas such as self-driving cars, medical imaging, and visual search capabilities.

openai said grab, the southeast asian version of meituan + didi, has used the technology to improve its map services. using just 100 examples, grab improved lane counting accuracy by 20% and speed limit sign location by 13%.

this real-world app demonstrates the possibilities of visual fine-tuning, using small batches of visual training data, to significantly enhance ai services across a variety of industries.

realtime api bridges the conversational ai gap

openai devday released the real-time api, which is currently in the public beta phase. the real-time api inherently simplifies the process of building voice assistants and other conversational ai tools, eliminating the need to stitch together multiple models for transcription, inference, and text-to-speech conversion.

this new product enables developers to create low-latency multi-modal experiences, especially in speech-to-speech apps. this means developers can start adding chatgpt’s voice controls into apps.

to illustrate the api's potential, openai showed off an updated version of wanderlust, a travel planning app it demonstrated at last year's conference.

with the help of real-time api, users can talk directly to the new version of the app and have natural conversations to plan their itinerary. the system even allows users to interrupt in the middle of sentences, mimicking a human conversation.

travel planning is just one example, real-time apis open up a wide range of possibilities for voice apps in various industries. whether specializing in customer service, education, or accessibility tools for people with disabilities, developers can now take advantage of new resources to create more intuitive and responsive ai-driven experiences.

some apps, including nutrition and fitness coaching app healthify and language learning platform speak, have already taken the lead in integrating real-time apis into their products.

comments say the real-time api is not cheap, charging $0.06 per minute of audio input and $0.24 per minute of audio output, but it can still represent a significant value proposition for developers looking to create voice-based apps. .

model distillation allows small models to have cutting-edge model functions

model distillation is regarded as openai’s most transformative new tool this time. this integrated workflow allows developers to fine-tune relatively small and cost-effective university models by using the output of cutting-edge models such as gpt o1-preview and gpt-4o, thereby improving more efficient models such as gpt-4o. 4o mini performance.

this approach makes it possible for smaller companies to take advantage of similar capabilities to cutting-edge models without incurring the computational costs of using such models. it helps bridge the gap the ai ​​industry has long had between cutting-edge, resource-intensive systems and more accessible but less powerful systems.

for example, a small startup in medical technology wants to develop an ai-driven diagnostic tool for rural clinics. using model distillation, the company can train a compact model that captures much of the diagnostic power of a larger model while only needing to be run on a standard laptop or tablet.

therefore, model distillation can enable resource-constrained environments to enjoy complex ai functions, potentially improving the level of medical care in underserved areas.