news

Doubao PC version "unboxing", from voice to dialect

2024-08-24

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

On August 22, the Volcano Engine AI Innovation Tour opened in Shanghai. The event demonstrated the improvement of the Doubao model in comprehensive scoring, voice recognition and other aspects.Voice capabilities are the focus of this release.

The big model team focused on conversational AI real-time interaction and output Seed-ASR, this achievement may be comparable to ChatGPT, the new advanced speech model released by OpenAI on July 31.

According to videos posted on social media at the time, OpenAI employees could interrupt the chatbot and ask it to tell a story differently, and the chatbot took their interruptions in stride and adjusted its responses.

In short, it supports "thinking while speaking".Possessing stronger contextual awareness, it enables better reasoning and more accurate responses.

It is noteworthy thatBean curdClaims of voice capabilitiesSupports one model recognitionmandarinand Cantonese, Shanghainese, Sichuanese, Xi'an dialect, Minnan dialect and many other Chinese dialects.

This makes me eager to speak to it in Hong Kong dialect and Sichuan style.

Next I will base on the 1.19.5_mac versionDoubao AI PC version,testAI text reading, screenshot recognitionAnd the recently popularAI video watching and AI dialect recognitionAnd other functions,Look at the comparison between bean bags andAI big models for web versions of various companiesWhat new things are provided.

As usual, those who are in a hurry can scroll down directly to the summary section.

AI Text Reading

The first is the AI ​​reading companion to the text.

I opened a news article, scrolled down to the summary, selected the paragraph I wanted to assist, and Doubao automatically appeared.Search, translate, interpret, copyAnd other functions.

existDiscover more skillsIn the middle, there is the AI ​​word-marking toolbar, which has 6 functions such as text expansion, abbreviation, correction, and polishing, 3 functions such as rewriting into social media copy or video script, 4 functions such as generating weekly reports, OKRs, and code error correction, 6 functions such as summarizing advantages and disadvantages, extracting task items, and brainstorming, plus those that are difficult to classify,There are a total of 22 module functions that can be customized and pinned to the top.

I chose the most basic requirement, and after about 25 seconds of waiting, I got the following content.

It can be seen that Doubao first summarizes the main idea, followed by a more conversational and popular explanation.What’s striking is that it proactively identifies and interprets the proper nouns in the selected text passage, such as the “Pareto rule” above.

So far, it remains to be seen whether the 22 functions provided by the Doubao module can demonstrate a deeper understanding of intelligence and personalization.But what is clear is that when the PC is running in the background, I don't need to copy and paste to another window to search, or even extract the proper nouns to search or ask questions separately.

AI Image Recognition

When I use Doubao to take a screenshot, a pop-up appearsSolve questions, answer questions, translate, and ask DoubaoThere are 3 function items, so I chose a high school math problem and asked Doubao to solve it and answer my questions.

Doubao not only provides the solution process and answer for the question in the screenshot area, but also provides several similar questions and their answers.

But when I used the translation and asked Doubao, not only was it unable to segment sentences intelligently, but it also made frequent mistakes.

Considering the difficulty of image recognition, I switched to paragraph-sized text, but there was no improvement.

I tried againAsk Doubao, underOrganize the core content of the pictureandExtract textI tried both modules separately.

In general, the core content organization function performs well.But the text extraction did not even recognize the complete image, which was neatly arranged in lead type.

AI Watch Video

The AI ​​video watching function is currently limited to B station videos and requiresOpen in Doubao interfaceAnd log in to the B station account.

So I randomly selected the content of Season 3, Episode 7 of "The School of Late Drink", and after waiting for about 20 seconds, I got the following content.

It can be seen that in the timeline of the video segmentation, the AI's picture and text matching is not accurate, butBasically, content segmentation can be achieved.

The video is in Japanese dubbing and traditional Chinese subtitles, which probably makes it difficult for Doubao.

The main idea is clearly summarized at the beginning of the video, but it is not clearly reflected in the text summary on the right. In addition, in the "Gratitude to Others" section, the character in the video thanks Miss Yuroe instead of Mr. Ushida, and Doubao's summary is wrong.

AI dialect recognition

According to the official announcement, Doubao supports Cantonese, Shanghainese, Sichuanese, Xi'an dialect, and Minnan dialect. Next, let's see if Doubao can recognize my poor Cantonese (it doesn't have my hometown dialect, only the poor Cantonese that comes from living in Hong Kong for half a year. I look forward to more native users sharing their experience~).

There is no problem with language recognition. Doubao understands "I want to eat porridge hot pot" and even provides a search option "Where can I find delicious porridge hot pot in Beijing?"However, after sending the message, it jumped to the AI ​​search dialogue interface, and the reply I received was text instead of voice.

In addition, dialect input is only available on the homepage, and I cannot continue to input in dialect on the conversation interface. Therefore, I need to return to the homepage again and again, and each message sent will open a new navigation page window. . .

However, being able to input in dialect is still a big breakthrough, and the overall performance is unsatisfactory. It is understood that the Doubao app supports voice reply.

I tried to use the mobile appDialectInput the same sentence, DoubaoMandarin VoiceReplied to me and provided the search option "Where can I find delicious porridge-based hotpot in Beijing?"

In other words, Doubao supports dialect input but does not currently support dialect interaction.This function is mainly used for fun and business purposes, such as organizing meeting minutes for participants who speak dialects.

Summary

In my imagination, there is an AI electronic doll on the desktop, which provides me with emotional value like my cat, and truly assists me in handling everything. It is as easy to wake up as Siri, but more powerful than Siri.

Doubao's AI text reading companion can be used across applications on the PC side and provides 22 module functions. In addition to basic text polishing, it also has scenario-based applicability for office workers, programmers, and self-media workers. It has the fundamentals I imagined, but also has a lot of room for exploration and growth.

In terms of image recognition, it is good at solving problems and answering questions, which is equivalent to the homework apps of Moubang and Mouyuan on the PC side. However, considering the user group on the PC side, we expect Doubao to make in-depth efforts in advanced mathematics+. After all, it is faster to solve problems and answer questions on ordinary homework and test papers on mobile phones, and only when combined with electronic questions or papers will there be demand for PC.

The segmentation and summarization functions of AI videos are very eye-catching, especially for popular science videos, where Doubao has great potential. Humanities and social sciences are common problems faced by all major models.

In fact, AI dialect is the feature I look forward to the most.After all, “the accent of a hometown remains unchanged even when one’s hair turns gray.” Hometown is sometimes a long menu, and sometimes it is a familiar “taste.” But overall, Doubao’s dialect interaction ecosystem still has a long way to go.

Dialect dialogues not only identify the hometown feelings of modern urbanites, but more importantly, technology penetrates the cold screen and cares for those who cannot speak the universal "Chinese". They write silent history with their lives, but are often forgotten by history. They also need AI and all the values ​​that come with AI.

When dialects move from recognition to interaction, Doubao may also be able to go further.