news

it’s lively again! openai’s enhanced version of “her” is officially released, surpassing gemini’s “production-level” upgrade…

2024-09-25

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

author: jessica

email: [email protected]

today is such a lively day in the ai ​​circle that we haven’t seen in a long time!

i was confused by the ai ​​essay posted by ultraman yesterday, and now the intention of his actions is obvious.

the target that ultraman wants to strike at is his old enemy google, or more precisely, the two upgraded gemini models that google just updated today: gemini-1.5-pro-002 and gemini-1.5-flash-002.

the sniping method is simple and crude: directly announce that the much-anticipated gpt voice function is officially open today.

in less than two hours, google's hard-earned moment of glory was snatched away again. if i were google, i would be so angry that i would vomit blood.

1

gpt advanced voice is here, it can speak more than 50 languages

openai said that chatgpt's advanced voice mode will be gradually rolled out to all plus and team users this week.

while people were patiently waiting, the team refined some features, including adding custom commands, a memory function, 5 new voices, and improved accents.

because it has been talked about for too long, openai made a special statement: "it can say 'sorry, i'm late' in more than 50 languages."

and an example of switching from english to mandarin: "grandma, i'm sorry i'm late. i didn't mean to keep you waiting so long. how can i make it up to you?"

——good boy, now you have become gpt’s grandmother and are forcing me to forgive you.

as can be seen in the video, speech patterns are now represented by a pulsating blue sphere, rather than the black animated dots that openai used when it demonstrated its technology in may.

when access is granted, a prompt will pop up in the app. it will be launched first for plus and teams tier users, and will be expanded to enterprise and education users next week.

chatgpt also added five new voices to try: arbor, maple, sol, spruce, and vale. so far, with the previous breeze, juniper, cove, and ember, the total number of chatgpt voices has reached 9 (google's gemini live has 10 voices).

you may have noticed that the names are all inspired by nature, from "maple tree" and "breeze" to "sun" and "valley", perhaps to make it feel more natural to use. one missing voice is sky, which was also the voice that openai demonstrated at the spring conference. it was removed from the shelves due to a legal dispute with scarlett johansson, the star of the movie "her".

openai has also extended some of chatgpt’s customization features to advanced speech modes, including a “custom commands” feature that allows users to personalize responses, and a memory feature that allows chatgpt to remember conversations for later reference.

for example, in the video below, in the custom chatgpt menu in the system settings, enter "my name is charlotte and i live in the san francisco bay area." when asked about weekend outdoor activities, gpt will address the user as charlotte and provide suggestions that are tailored to local weather and traffic.

openai said the team has improved the response speed, fluency and accent in some foreign languages. the voice will adjust according to the tone of the conversation, and you can create scenes to prompt it to play different roles. the sound delay is very low and the understanding is stronger. it really feels like having a natural conversation with another person.

however, the video and screen sharing features that openai demonstrated four months ago have not been updated this time. at that time, the staff asked gpt math problems on paper and codes on the computer screen, and obtained real-time answers through natural voice dialogue. at present, openai has not provided a timetable for the launch of this multimodal feature.

in addition, the advanced voice mode is not currently available in the eu, uk, switzerland, iceland, norway, and liechtenstein.

nevertheless, finally being able to get my hands on the openai version of “her” is something worth getting excited about for those who have become tired of the ai ​​community. coupled with the o1-preview that just created a wave of enthusiasm, openai has firmly controlled the industry for another week.

this excitement also caused everyone to suffer from intermittent amnesia:

so what did google release today?

1

gemini 1.5 upgrades to two new models, halving the price and increasing the speed

this update from google is actually quite significant, at least for developers.

according to google blog, this time they updated two production-level gemini models: gemini-1.5-pro-002 and gemini-1.5-flash-002. the so-called "production-level" means that the ai ​​model has been fully developed, tested and optimized, and is ready for commercial deployment, can handle a large number of user requests, and is applied to product services, rather than just for experiments or research.

a major upgrade to the gemini 1.5 series models unveiled at i/o in may, the new models are faster, more powerful and more cost-effective.

the main highlights are summarized as follows:

1. significant price reduction: the input and output prices of the 1.5 pro have dropped by about 50%, which significantly reduces the cost of building, especially for prompts less than 128k tokens.

2. overall quality improvements: in particular, performance improvements in mathematics, code generation, long text contexts, and vision tasks are significant, including an improvement of about 20% in benchmarks such as math and hiddenmath, and an improvement of 2%-7% in vision and code applications.

3. increased rate limit: the rate limits of 1.5 flash and 1.5 pro have been increased from 1000 rpm (requests per minute) and 360 rpm to 2000 rpm and 1000 rpm, respectively, allowing developers to build and process tasks faster.

4. faster output and lower latency: output speed is increased by 2 times and latency is reduced by 3 times, providing support for more efficient application scenarios.

5. more concise responses: the response style is more concise and cost-effective, and the output length is shortened by 5%-20%, while reducing the number of rejections and avoidances on many topics while maintaining high usefulness.

6. multimodal and long context support: 1.5 pro’s 2 million token long context window supports processing long text and multimodal tasks, such as content generation of 1,000-page pdfs or long videos.

7. updated filter settings: the model's default security filter is no longer automatically applied, and developers can customize the model's security settings as needed.

the two new models are available to developers for free through google ai studio and the gemini api, and are also available to large organizations and google cloud customers on vertex ai.

1

gemini in the shadow of gpt

but when compared with its peers, many ordinary users expressed disappointment with google's move, feeling that it was not even a true "release."

abacus.ai ceo and well-known blogger bindu reddy said, "alas, openai released o1 which passed the iq test, while google just made some minor updates to gemini 1.5. how can this be when they have 100 times the resources, 10 times the talent, and 10 times everything?"

although some developers still speak for google, for example, a netizen in the reddit discussion area said:

“this is all useful stuff for people who are actually building apps and trying to reduce costs and increase profits. the app i’m working on has a fixed cost per action determined by token length, and this has increased my profits by about 30%+. this probably won’t mean much to most people. i know a lot of people will be pissed off by this “announcement” from google — but it’s actually a nice update for developers.”

cutting the price in half, increasing the speed, and reducing the latency are indeed what developers want. but as everyone said, the appeal may only be limited to the developer community.

even some developers scoffed: "i don't see a comparison with claude or o1, and we're about to see the next generation of openai and anthropic models. deepmind actually has models that are far superior to current ones, but they are going directly to the enterprise route and bypassing the public. is gemini impressive? not at all, it's completely disappointing."

google's poor naming of the models was also ridiculed by netizens, who believed it was lengthy and confusing.

the information recently published an article titled "why ai developers are skipping google's gemini". through interviews with multiple ai company founders and google employees, it tells how gemini was "abandoned" by developers and the obstacles and difficulties it encountered in catching up with chatgpt.

for example, compared with competitors' technologies, calling gemini is too complicated for developers and enterprises. aidan mclaughlin, founder of topology, said that it took him only 30 seconds to use openai's api for the first time, while it took him 4 hours to use gemini. at the same time, google's large model performance ranked behind openai and anthropic, and it was not worth it for him to cross these obstacles.

compared to chatgpt, gemini’s unpopularity among developers seems to be an open secret in the real world.

a june survey of more than 750 tech workers by enterprise software startup retool found that only 2.6% of respondents said they most often used gemini to build ai applications, while more than 76% chose gpt.

website traffic data tracked by similarweb showed that openai's app developer page received 82.8 million visits between june and august, while google's page received 8.4 million visits.

smaller, informal surveys offer similar evidence. late last month, julian saks, founder of finetune, asked 50 ai startup developers at his san francisco coworking space what conversational ai models they used most often. almost all said they primarily used models from anthropic or openai, and no one mentioned gemini.

although the gemini model is very useful for analyzing long documents or long code bases, many developers said that google's model options are too diverse, the steps are complicated, and the developer system is different from openai, making it more difficult to use. and sometimes, the different services provided by google compete with each other in its own search results, making it easy for people to get "stumbled" when trying to figure out these tools.

gemini is often mocked on x for this. brendan dolan-gavitt, an ai researcher at security startup xbow, sent a tweet earlier this month detailing the numerous steps he needed to take to get started with gemini through vertex, which quickly went viral. other developers took to the comments section to express sympathy.

in an environment where "the world's leading engineers are using openai, claude or cursor", developers really don't have to try other ones. on the other hand, the decline in usage will not allow gemini to obtain as much data feedback as chatgpt, causing google to face a more vague roadmap for improving the model.

1

disappointment is because people have high expectations of google

google is trying to change that perception, including by responding to criticism of gemini at x, bringing in more star technical experts from companies like openai, and merging overlapping development functions. they are also promoting gemini by hosting developer events.

today, in parallel with the launch of gemini-1.5-pro-002, there was also an online event for gemini for work, where google spent a lot of time promoting the current application cases of gemini in companies such as best buy, snap, ups capital, wayfair, etc. it is reported that they are trying to attract more large corporate customers by providing a certain degree of "white glove" service.

however, facing its entrenched market share, google's fightback may not be easy.

“the reality is that openai is ahead of google in terms of llm api developer tools,” said logan kilpatrick, product lead at ai studio, who led developer relations at openai before joining in april. “we have to fight their current entrenched market share among developers.”

earlier, rowan cheung, a well-known blogger in the ai ​​circle, announced that he had completed an interview on major upgrades to ai models, and today is a big day for developers.

under that tweet, logan kilpatrick’s smiley face emoticon looked a little awkward amidst a sea of ​​regretful voices like “why isn’t it claude opus 3.5?”

conservative, controversial, and lagging behind are the stereotypes that google, the ai ​​giant, has left in the community today. the launch of gemini-1.5-pro-002 does not seem to have broken this deadlock.

people's disappointment with this company comes from the high expectations they had for it: with such strong strength and talent pool, it is unable to provide the world with more options to "alternate" openai, which everyone would feel regretful about.