news

Google releases Gemini Live: supports AI voice chat and can simulate interview scenarios

2024-08-14

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

IT Home reported on August 14 that at the Pixel 9 series mobile phone launch conference held by Google today,The Gemini Live service was launched and will be initially available to Gemini Advanced subscribers in English starting today.


Promote natural, fluid conversations

Google said Gemini Live provides a mobile conversation experience that allows users to have free and smooth conversations with Gemini.

Gemini Live can be said to be a counterpart to OpenAI ChatGPT's newly launched Advanced Voice mode (limited Alpha test), which uses an enhanced voice engine to enable more coherent, emotionally expressive, and realistic multi-round conversations.


Google says users can interrupt the chatbot while it’s speaking to ask follow-up questions, and the chatbot will adapt to the user’s speaking patterns in real time.

IT Home translated part of the Google blog post as follows:

Through Gemini Live [using the Gemini app], users can talk to Gemini and choose from [10 new] natural voices that it can respond to. Users can even speak at their own pace or interrupt mid-answer to ask clarifying questions, just like in a human conversation.

Google demonstrated a scene in Gemini Live that simulates a conversation between a user and a hiring manager (or an AI, as the case may be), providing the user with presentation skill recommendations and optimization suggestions.

A Google spokesperson said:

Live uses our Gemini Advanced model, which we tweaked to be more conversational. Its large context window is used when users have long conversations with Live.
Does not support multi-modal input

Gemini Live doesn't yet have one of the features Google showed off at I/O: multimodal input.

Google released a prerecorded video in May showing Gemini Live seeing the user's surroundings through photos and videos captured by the phone's camera and reacting, such as naming the parts on a broken bicycle or explaining what a section of code on a computer screen does.

Google said multimodal input will be available "later this year," but declined to provide specific details.