news

Google version of Her crashed: tried 3 times and changed phones before it worked... Netizen: I only need 10 seconds to do it manually

2024-08-14

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Jin Lei from Aofei Temple
Quantum Bit | Public Account QbitAI

Rollover, big rollover.

Just this morning, the Google version of Her——Gemini LiveOfficially released.

After all, it is obviously a benchmark against OpenAI's GPT-4o, and it can be said that it has attracted the attention of the technology circle.

In the demo released on the official website,Photo Q&AThe effect of the function is this:



The general function it achieves is to take a photo of a concert poster with a mobile phone and let Gemini check the user's calendar to see if the itinerary is suitable for participation.

And a series of subsequent operations can also be carried out based on this, including checking ticket prices at a fixed time and so on.

However... when it came to the live demonstration session at the press conference, the picture changed 180 degrees.

Please watch the VCR:



Video address: https://mp.weixin.qq.com/s/90pixdMaLew4lUjzjeA6jA

  • First time: failed.
  • Second time: failed.
  • The third time: changed the phone, it worked.

Hmm... the guy's expression looked visibly panicked.



Even the well-known technology media TechCrunch has added such an emoji:



Some netizens even started to comment spicyly:

I can search my calendar in 10 seconds.



Of course, this is just a small episode of Made by Google today.

For more information about Gemini Live, please continue reading below.

The full picture of Google Her

As we just mentioned, Gemini Live and GPT-4o have very similar functionality.

In addition to "photo question and answer", it can also conduct real-time conversations and even interrupt Gemini during his reply.

It is understood that Gemini Live’s features are now available to Advanced subscribers on Android (English only).

This feature will be expanded to more languages ​​and available on iOS in the coming weeks.

existDialogue soundIn terms of10 typesNew sounds to choose from, the effect is as follows:



Video address: https://mp.weixin.qq.com/s/90pixdMaLew4lUjzjeA6jA

existoperateOn the other hand, since Gemini is fully integrated into the system, you can call it up by just long pressing the power button or saying "Hey Google".

For example, when writing an email, let Gemini help you generate pictures, the effect is as follows:



However, foreign media have mixed reviews on this feature.

For example, a writer at The Verge wrotePro-testThe title given later is——

Gemini Live is faster than Google, but more awkward.



The specific reason was that the author's car's audio system suddenly failed during a three-day road trip.

Finding a solution using the original Google Assistant took at least five minutes, while Gemini Live took just 15 seconds.

However, the author felt embarrassed by Gemini Live's continuous speech during the conversation and the interactive method that required users to actively interrupt.

He believes:

The voice and manner of speaking are so human that it feels uncomfortable to interrupt it.
There is more emotion invested in interacting with Gemini Live, rather than using it as a problem-solving tool.

Coincidentally, for Gemini LiveRunning in the cloudThe Wall Street Journal also gave a sharp comment on this point:

Improvements in conversation, regression in functionality.



Specifically on the technical level, GPT-4o is an end-to-end system, but judging from what Google has released, Gemini Live is not the case.

Instead, the STT, VAD, LLM and TTS systems are integrated:



In addition, Gemini Live also appeared in the new Pixel series of mobile phones released by Google.

Including Pixel 9 Pro Fold, Pixel 9, Pixel 9 Pro and Pixel 9 Pro XL.



In terms of AI functions, Google's Pixel phone has added a feature called“Add Me”function.

Augmented reality (AR) and AI technology can be used to "stuff" people in two different photos together.



Why can't Google catch up with OpenAI?

Although Google's release of Gemini Live is a challenge to OpenAI GPT-4o, we can see a very obvious trend since the beginning of the era of large models.

Google can't keep up with OpenAI.

First of all, at the most critical node of ChatGPT release, OpenAI became a pioneer, but Google subsequently released Bard, which was very similar to Gemini Live, and subsequently suffered a setback.

Since then, over the past year and a half, it seems that OpenAI has been leading the release of all major models and applications.

On the other hand, Google's technology is not only slowing down; even in public opinion, OpenAI's personnel change (Ilya's resignation) overshadowed Google's biggest annual event (the I/O conference).

So why didn’t Google succeed in the era of big models?

In this regard,Former Google CEOEric Schmidt (2001-2011) expressed his views in a recent speech at Stanford:

Google attaches great importance to work-life balance, for example, allowing employees to work from home.
But startups, they work really hard.



Video address: https://mp.weixin.qq.com/s/90pixdMaLew4lUjzjeA6jA

Some netizens even broke the news:

My brother is a top AI programmer at Google. He has 3 full-time jobs and only spends 2 hours a day at Google.



So what do you think about this? Feel free to leave a message in the comment section to discuss.

Reference Links:
[1]https://x.com/techcrunch/status/1823410187404743131?s=46&t=6eepxw1G6XRQ7VO0ANjJWg
[2]https://x.com/GoogleDeepMind/status/1823409674739437915
[3]https://blog.google/products/gemini/made-by-google-gemini-ai-updates/
[4]https://x.com/alexkehr/status/1823480786349383879?s=46&t=6eepxw1G6XRQ7VO0ANjJWg
[5]https://www.theverge.com/2024/8/13/24219736/gemini-live-hands-on-pixel-event
[6]https://blog.google/products/pixel/google-pixel-9-new-ai-features/#pixel9phones