2024-10-02
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
author|sukhoi
editor|wang bo
with executives leaving and apple withdrawing from financing negotiations, this is undoubtedly a tumultuous week for openai. but openai is still insisting on convincing developers to build applications using their ai models.
on october 1, local time in the united states, openai held devday in san francisco. unlike last year's grand event, this year's event was more low-key and turned into a roadshow for developers.
this time, openai isn’t launching a major product, but is instead working on existingAItools andAPIthe kit has been incrementally improved.
they unveiled four innovations:vision fine-tuning, real-timeAPI(realtime api), model distillation, and prompt caching.
for example, the public beta version of the real-time api allows developers to create applications that can quickly generate ai voice responses. this new technology is not only fast to respond, but also provides six different sound options. the sounds are all developed by openai itself, avoiding third-party copyright issues. this api does not "copy" chatgpt's advanced voice mode, but the functionality is basically similar.
romain huet, openai's director of developer experience, also demonstrated how to use o1 to build an iphone ios application with a prompt in about 30 seconds.
yute demonstrates building iphone ios applications. image credit: x by romain huet
in the past two years, openai has reduced the cost for developers to access its api by 99% in response to market pressure from competitors such as meta and google. and from the context of the new tool, we can find out,openai's strategy favors strengthening its developer ecosystem rather than competing directly in end-user applications.
before the event, openai’s chief product officer kevin weil mentioned that chief technology officer mira murati and chief research officer bob mcgrew’sresignation will not affect the company's long-term development.he said that despite "frequent personnel changes", openai can still "maintain development momentum."
as tech groups like google and apple race to roll out so-called artificial intelligence agents to consumers,openai thinksAIassistant will "go mainstream" next year.the capabilities of ai assistants, including reasoning and completing complex tasks, have become the latest battleground for technology companies, each hoping to tap into this rapidly developing technology to develop revenue streams.
"it is hoped that ai's interaction methods can cover all the ways humans interact." weir said, "the development of agent systems will make this interaction possible." in short, let ai be able to imitate or replicate human communication and interaction. various methods in it, whether it is verbal communication, emotional expression or non-verbal communication, etc., make the interaction between humans and ai as natural and seamless as possible.
in addition to openai, other companies such as microsoft, salesforce and workday are also placing agent capabilities at the core of their ai plans, while google and meta have also stated that integrating ai models into their products is a key focus area for them.
last year, openai released its "assistants api" to allow developers to build agents using its technology. but they also revealed that plans were hampered by the limited functionality of early models.
weill mentioned that the improvements in thinking and reasoning provided by openai's latest models will be reflected in its products, such as chatgpt, and in the startups and developers that build applications using its api, but did not say whether they will immediately develop their own. ai agent.
openai demonstrated a live conversation with an ai system tasked with helping to find and purchase locally available products. for example, if you buy strawberries, the ai will then call the merchant to place an order according to the user's instructions.
demonstration of ai buying strawberries based on prompts. image source: ken collins’ x
openai stresses that anyone utilizing the technology must make it clear that it is an ai, not a human, and that it only provides developers with limited preset options, not the ability to create new sounds.
“if we do it right, we’ll have more time to focus on what’s important and less time staring at our phones,” ware said.
1. tip caching: a savior for developer budgets
the "hint cache" feature is one of the most important launches of this event and is used to reduce costs and latency for developers.
many developers building ai applications reuse the same context across multiple api calls, such as when editing a code base or having a long, multi-turn conversation with a chatbot. hint cache automatically applies a 50% discount to the most recently processed input token by the model by reusing the most recently seen input token.
caching input tokens can save up to 50% compared to non-cached tokens in various gpt models. image source: openai
hint cache availability and pricing starting today, hint cache will automatically apply to the latest versions of gpt-4o, gpt-4o mini, o1-preview, and o1-mini, as well as fine-tuned versions of these models. cached tips provide a discount compared to uncached tips.
api calls to supported models will automatically benefit from prompt caching, for prompts longer than 1024 tokens. the longest prefix of hints calculated before the api cache, starting at mark 1024 and increments of 128 marks. if a user frequently uses hints with common prefixes, openai will automatically apply the hint cache discount without the user needing to make any changes to the api integration.
the cache is usually cleared after 5 to 10 minutes of inactivity, and is always removed within an hour of the cache's last use. like all api services, tip caching is subject to opai's corporate privacy commitment. tip cache is not shared between organizations.
the significant reduction in costs provides opportunities for various companies to develop new applications that were previously too costly to implement.
olivier godement, openai platform product lead, spoke at a small press conference at openai's san francisco headquarters: "we have been very busy. two years ago gpt-3 was leading the technology in its class, but now we have achieved a nearly 1,000-fold reduction in related costs.” he said proudly that he could not find any other technology that had achieved a similar scale of cost reduction in just two years.
2. visual fine-tuning: the new frontier of visual ai
another big announcement is the introduction of visual fine-tuning capabilities in openai’s latest large-scale language model, gpt-4o. developers can fine-tune not only text but also images, which could transform areas such as self-driving cars, medical imaging and visual search capabilities.
since the introduction of text fine-tuning, hundreds of thousands of developers have leveraged text-only datasets to optimize models to improve performance on specific tasks. but in many cases, text fine-tuning alone won't meet all needs. through visual fine-tuning, developers can optimize the gpt-4o model by simply uploading at least 100 images to improve its performance in vision tasks, especially when processing large amounts of text and image data.
grab, southeast asia's leading food delivery and ride-sharing company, has already leveraged the technology to improve its mapping services, according to openai. using just 100 examples, grab improved lane counting accuracy by 20% and speed limit sign location accuracy by 13%.
example of a speed limit sign successfully marked by the visual fine-tuning gpt-4o model. image source: openai
automat uses visual fine-tuning to train gpt-4o to recognize ui elements on the screen, based on a dataset of screenshots, thereby improving the success rate of its automation tools. in this way, the success rate of automat's robot agent increased from 16.60% to 61.67%.
desktop robot successfully identifies ui element centers through visual fine-tuning using website screenshots, source: openai
real-world applications of visual fine-tuning demonstrate the possibilities of visual fine-tuning to significantly enhance ai services across a variety of industries using small batches of visual training data.
the visual fine-tuning feature is now available to all paying users and supports the latest gpt-4o model. developers can leverage these capabilities to extend existing training datasets for image fine-tuning. additionally, openai is offering 1 million free training tokens per day until october 31, 2024. the fees for fine-tuning training and inference will be adjusted later.
3. real-time apis: bridging the gap between conversational ai
the real-time api is currently in public beta. it enables developers to create low-latency, multi-modal experiences, especially in speech-to-speech applications. this means developers can start adding chatgpt’s voice controls to their apps.
to illustrate the api's potential, openai demoed an updated version of wanderlust, a travel planning app shown at last year's conference.
with the help of real-time api, users can talk directly to the application and plan their trip in a natural conversational manner. the system even allows interruptions during speech, mimicking human conversation.
healthify is a nutrition and fitness coaching app that uses real-time apis to enable natural conversations with ai. image source: openai
while travel planning is just one example, real-time apis open up a wide range of possibilities for voice applications across a variety of industries. from customer service to education and accessibility tools, developers now have powerful new resources to create more intuitive and responsive ai-powered experiences.
"whenever we design a product, we basically think about both startups and enterprises at the same time," goldment explained. "so in the alpha phase, we have a lot of enterprises using apis, as well as new models for new products."
the real-time api essentially simplifies the process of building voice assistants and other conversational ai tools, eliminating the need to stitch together multiple models for transcription, inference, and text-to-speech conversion.
early adopters such as nutrition and fitness coaching app healthify and language learning platform speak have integrated real-time apis into their products. apis have the potential to create more natural and engaging user experiences in areas ranging from healthcare to education.
the real-time api's pricing structure, while not cheap ($0.06 per minute of audio input, $0.24 per minute of audio output), still represents a significant value proposition for developers looking to create voice-based applications.
4. model distillation: towards more accessible ai
perhaps the most transformative announcement of this is the introduction of model distillation.
its integrated workflow enables developers to use the output of advanced models such as o1-preview and gpt-4o to improve the performance of more efficient models such as gpt-4o mini. small companies can also use similar features of advanced models without worrying about affording the computing costs.
fine-tuning demo, source: openai
model distillation addresses the ai industry’s longstanding gap between cutting-edge, resource-intensive systems and more accessible but less powerful systems.
let’s say a small medical technology startup is developing an ai diagnostic tool for rural clinics. using model distillation, the team can train a small model that can run on a standard laptop or tablet and capture the majority of the diagnoses of the larger model. ability.
this can bring sophisticated ai capabilities into resource-constrained settings to improve health care outcomes in underserved areas.
it is not difficult to see from this update that openai has made an important strategic change - focusing more on the development of the ecosystem rather than simply pursuing eye-catching product launches, although the strategy may not be as direct to the public as the product launches.
compared with the exciting developer day in 2023, which launched the gpt store and custom gpt tools, this year's event is much lower-key. the rapid changes in the field of ai, coupled with significant advances by competitors and increasing concerns about the availability of training data, have prompted openai to focus more on refining existing tools and enhancing developer capabilities to cope with these changes.
by improving model efficiency and reducing costs, openai hopes to maintain its edge over fierce competition and address issues of resource intensity and environmental impact. openai's success will depend heavily on its ability to effectively cultivate a vibrant developer ecosystem.
references:
《Introducing the Realtime API》,OpenAI
《Introducing vision to the fine-tuning API》,OpenAI
《Prompt Caching in the API》,OpenAI
《Model Distillation in the API》OpenAI
《OpenAI’s DevDay 2024: 4 major updates that will make AI more accessible and affordable》,VentureBeat
《OpenAI’s DevDay brings Realtime API and other treats for AI app developers》,TechCrunch
(cover image source: openai)