Google's version of Her is ahead of the curve! One-click summoning of Gemini will revolutionize 5.2 billion terminals worldwide

2024-08-14

New Intelligence Report

Editor: Editorial Department

【New Wisdom Introduction】Google's version of Her, is it coming before OpenAI? Google's voice model Gemini Live will soon be available on 3 billion Android and 2.2 billion iOS devices worldwide. Although the live demonstration still failed a little, Google is determined not to wait any longer and is determined to intercept OpenAI and fight against Apple!

OpenAI, was just intercepted?

Following closely behind OpenAI’s “Her”, Google also officially announced: the release of AI voice function!

Just in the "Made by Google" keynote, Google announced the launch of Gemini Live, a voice mode. Soon, Gemini Live will be available in the Gemini mobile application.

The arms race between Google and OpenAI has begun again.

Look at OpenAI. The groundbreaking "Her" three months ago is still silent today. It seems that it is about to be snatched away by Google.

Rick Osterloh, a Google leader present at the event, also said meaningfully: "We have heard too many promises about AI and slogans about what is coming. Today, we are going to show real progress!"

In addition, during the presentation, Google also detailed how Gemini will be more deeply integrated into Android, applications, and new Pixel devices.

In the Pixel 9 series of mobile phones released at one time, Google also explored the new form of "AI+mobile phone": what kind of edge AI product form will be incubated by the integration of Gemini, Android, and Pixel.

Now, with Android powered by AI, can Google beat Apple?

Google's "Her" is here too

According to Google, Gemini Live is a new mobile conversation experience.

If we want to brainstorm what kind of jobs we can get based on our skills and education, Gemini can immediately chat with us in real time.

It feels like having a caring assistant in your pocket who can chat with you at any time.

And like OpenAI, Google's voice function also allows users to communicate with it in natural conversational language, and it responds with a real-life voice and rhythm.

Please listen to the audio below. The timbres of several male and female voices are very natural.

In order to give us the most natural experience, Google launched 10 voices at once, and we can choose our favorite tone and style.

In addition, Gemini Live also supports hands-free function. Even if the Gemini application is in the background or the phone is locked, we can still talk to it just like in a normal phone call.

Also, we can interrupt it at any time and change the topic - looks familiar, right? Yes, OpenAI's voice can do it.

The advanced voice feature "Her" that amazed everyone in May is still on hold. At the end of last month, it was only selectively opened to a small number of Alpha test participants.

In terms of speed, Google clearly beats OpenAI.

Gemini Live is now available on Android devices for just $19.99 per month by accessing the Google Gemini app.

The English version is currently available, and the iOS version and support for more languages will be launched in the coming weeks.

On the other hand, in terms of user scale, Google's advanced speech model will also reach a wider range of potential users than OpenAI.

You know, there are more than 3 billion Android users and 2.2 billion iOS users in the world today.

The reason why OpenAI's voice function failed may be partly due to the abnormal performance of AI in the red team test.

Has Google completely solved these security issues? No one knows yet, but it is clear that Google, not wanting to be left behind, has decided to go all out this time.

But the car flipped over twice

The only drawback was that there were some minor incidents during the Gemini Live demonstration.

When Google executive Dave Citron was demonstrating Gemini's new feature of connecting Google Calendar, Tasks and Keep on new Android phones, he didn't expect it to fail twice in a row.

He first took a photo of a poster for Sabrina Carpenter's fashion show in San Francisco with his mobile phone, and then asked Gemini, "Check my schedule to see if I'm free to attend Sabrina Carpenter's fashion show."

In Gemini’s first reply, it said there was an error and asked to try again.

When I repeated the steps for the second time, Gemini still did not respond.

It was not until the third time (when a different device was used) that the result was finally given, and there was a burst of cheers at the scene.

Redefining AI Assistant

In this speech, Google said: Through Gemini, they have reimagined what it means for personal assistants to be truly useful to humans - more natural, conversational, and intuitive.

Connect more apps

What are the most important keywords for a good AI assistant?

connect.

That’s what Gemini does, integrating with all the Google apps and tools we use to accomplish tasks big and small.

And unlike other assistants, we don't have to waste time switching between apps and services.

In the coming weeks, Google will also launch new extended features, including Keep, Tasks, Utilities and YouTube Music.

What food is in the picture? Ask Gemini and it will list them all for you

Suppose we are going to host a dinner party, Gemini can show its various skills——

From Gmail, it can pull up a lasagna recipe someone sent us and add the ingredients to a shopping list in Keep; then ask Gemini to put together a playlist for us that says, “Reminds me of the late 90s.”

In Google's upcoming calendar extension, we can directly take a picture of a concert poster and ask Gemini: Am I free that day? If the answer is yes, we can also ask Gemini to help us set reminders and prepare to grab tickets.

Let Gemini write an email to the professor to ask for leave and request a deadline extension for a few days. Just say it.

Summon Gemini with one click

Now, Gemini has been fully integrated into the Android user experience.

Only in the Android system can we experience such smooth context-aware functions.

As long as we have an Android phone, Gemini can appear when we need it, no matter what we want to do.

Just press and hold the power button, or say "Hey Google", to summon Gemini!

If you are using YouTube, you can ask Gemini questions about the video.

For example, suppose we are planning a trip abroad and have just watched a travel video blog. We click "Ask this video" to ask it to list all the restaurants that appear in the video and add them to Google Maps. Gemini will do it one by one.

Look at the picture below. The images generated by Gemini can be directly dragged and dropped into Gmail and Google Messages.

I believe you have already experienced the beauty of this operation——

Because Gemini has deep integration into Android, the AI can do more than just read the screen, but can also interact with many of the apps we already use.

Gemini 1.5 Flash, with AI assistant

However, there are two problems: LLMs that can better interpret natural language and handle tasks often mean that even simple tasks take more time to complete.

It can also be a headache if AI exhibits unexpected behavior or provides inaccurate information.

To this end, Google has specially introduced a new model - Gemini 1.5 Flash.

It responds faster and provides higher quality answers.

In the coming months, Google will also integrate the model more deeply with Google Home, Phone, and Messages.

Google says today we’ve officially reached an inflection point where the usefulness of AI assistants far outweighs their challenges.

Based on Imagen 3, 2-second image generation

At the conference, Google also launched a new AI photo application - Pixel Studio.

Just a few words are needed to create a beautiful picture.

Most importantly, it is a local image generation APP built based on Imagen 3, which can generate various images within 2 seconds.

Today, the technical report of Imagen 3 was also released. The technical details can be found in the 32-page paper.

Paper address: https://arxiv.org/pdf/2408.07009

The first AI phone costs $20 per month

Google has embedded all of these AI capabilities into its latest mobile phone hardware.

At the scene, Google released a total of four AI mobile phones - Pixel 9, Pixel 9 Pro, Pixel 9 Pro XL, and the second-generation folding screen Pixel 9 Pro Fold.

What you must not miss on the new Pixel 9 series phones is the AI-powered photo-taking capability.

Google said that the image processing algorithm - HDR+ pipeline, has been completely rebuilt, allowing the photos taken to have better contrast, shadows, exposure, sharpening, color, etc.

Here are the new AI image editing capabilities added to the Pixel 9 series phones:

Add Me

Do you often find yourself taking photos during family gatherings, team building activities, or family trips, but yourself is the only one missing from the photos?

However, you don’t have to worry about it in the future.

Google's "Add Me" function can make up for your regrets.

First, a group photo needs to be taken. Then, the person in charge of taking the photo switches positions with the people in the photo and takes a photo that includes the "photographer".

At this point, Pixel uses real-time AR technology to guide the second person taking the photo to compose the photo to match the composition of the first photo.

Finally, the Pixel then merges the two images, ensuring that everyone appears in the same photo, including the photographer.

Reimagine

Another Reimagine function is easy to understand.

This is a feature of the Magic Editor editor that allows you to describe the effect you want directly in the text box.

AI can turn your ideas into reality.

For example, you can modify the background of the photo to create various scenes such as volcanoes, sunsets, and auroras.

Auto Frame

Auto Frame is a new feature in Magic Editor that allows you to recompose photos you have already taken.

It can even expand your photos and generate blank backgrounds through AI.

Zoom Enhance

Zoom Enhance can automatically fill the gaps between pixels and accurately predict details to achieve high-quality shooting magnification effects.

The realization of AI capabilities is inseparable from the powerful chips behind the Pixel 9 series.

The most powerful AI processor: Google Tensor G4

The new phone uses Google's newly designed processor - Google Tensor G4.

Google said, "The Tensor G4 chip is our fastest and most powerful chip yet."

Based on last year's Tensor G3, Google teamed up with Samsung to create the semi-custom processor Tensor G4 based on the 4nm process, utilizing the CPU and GPU cores provided by Arm.

At the same time, it also uses Google's own modules to enhance AI, photography and security functions.

It is reported that compared with the previous two generations, the G4 has a 20% increase in web browsing speed, a 17% increase in APP startup speed, and a 20% increase in battery power consumption for daily application use.

In terms of CPU, the G4 is equipped with 1 Cortex-X4 core running at 3.1GHz, 3 Cortex-A720 cores running at 2.6GHz, and 4 Cortex-A520 cores running at 1.95GHz.

In comparison, the Tensor G3 has one 2.91GHz Cortex-X3 core, four 2.37GHz Cortex-A715 cores, and four 1.70GHz Cortex-A510 cores.

Even though the Tensor G4 has one less core, all cores are clocked 200MHz to 300MHz higher.

According to the leaked Geekbench scores, the Tensor G4 scored 2,005 in the single-core test and 4,835 in the multi-core test. In comparison, the Tensor G3 scored 1,751 in the single-core test and 4,208 in the multi-core test. There is a 14% performance difference.

As for the GPU, the Tensor G4 uses the same ARM Mali-G715 GPU as last year's Tensor G3, but the frequency is increased from 890MHz to 940MHz. This means that the GPU performance of the Tensor G4 should be slightly better than that of the Tensor G3.

Adding new AI functions

AI is of course one of the main driving forces behind the Tensor Project.

The redesigned Tensor G4 is designed to power the latest Gemini and computational photography features.

The Gemini Nano model (maximum version with 3.5 billion parameters) can be run locally and output content at a rate of 45 tokens/s.

Although Google’s TPU is fast, it is not ahead of its competitors in token processing.

In comparison, Qualcomm Snapdragon 8 Gen 3 can output 15 tokens per second when running 10 billion parameters; MediaTek Dimensity 9300 can run 7 billion parameters at 20 tokens per second.

However, the unique AI features of the Pixel 9 series may not rely entirely on the new chip, but be the result of other factors.

AI also requires large amounts of memory and needs access to fast and large pools of memory to run more complex models.

The Pixel 9 comes with 12GB of RAM, and the Pro series upgrades to 16GB.

Google says that for a smoother AI experience, this is the first time it has set aside "a portion of dedicated RAM to run Gemini on a device," preventing other apps from using that memory.

However, Google did not disclose how much was allocated for AI tasks.

Although the chip itself has no major upgrades in AI, the optimization of RAM management may still bring better AI experience and new features.

References:

https://blog.google/products/gemini/made-by-google-gemini-ai-updates/

https://x.com/TechCrunch/status/1823410187404743131

https://venturebeat.com/ai/googles-ai-surprise-gemini-live-speaks-like-a-human-taking-on-chatgpt-advanced-voice-mode/

https://www.androidauthority.com/google-tensor-g4-explained-3466184/

news

Google's version of Her is ahead of the curve! One-click summoning of Gemini will revolutionize 5.2 billion terminals worldwide

Introduction

My contact information