news

we made a video call to the ai ​​and found that it seemed to be able to talk about everything...

2024-08-31

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


at the openai conference three months ago, i believe you have seen gpt-4o to some extent. it is just like a real person.smooth video conversation capabilities.

there is also project astra launched by google, which seems to be just as powerful as gpt-4o.


during that period, almost the entire internet was bragging about how powerful ai's interactive capabilities had evolved, and words like epic and next level were used.

as a result, the video call function promised by gpt-4o has been delayed again and again, and project astra has not been seen for several months.one by one, they are almost turned into black carp by the bad reviewer...

but i found that there seems to be a rule in the ai ​​circle, that is, good things cannot make everyone wait too long. look at sora, which was kept secret for half a year, and then keling, luma ai, and zhipu qingying all emerged.

just these two days, at the kdd data mining conference in barcelona, ​​zhipu, in front of the global academic and industry circles,not only did it release the latest large base model glm-4-plus, but it also upgraded the video call function of zhipu qingyan.


let me highlight this for you all.it is the legendary one that can be seen and chatted with.AIthe video call function can now be used directly in the qingyan app. you can download the app to apply for a trial first.

putting aside other things, compared with openai, zhipu’s speed is already much faster...

so when qingyan's video call function was launched, i was the first to use it. open the qingyan app, click the call button in the lower right corner, switch to the video after entering, and start playing directly ~


you should also know that one of the important reasons why gpt-4o is hyped up is that it has a very strong ability to understand videos.

then the most basic thing is, we need to test qingyan’s video comprehension ability first, right?

mr. bad review showed qingyan a glimpse of the conference room where our editorial department usually brainstorms, to see if it could guess what i was doing based on the surrounding environment. he even shook the camera on purpose, not keeping the video screen completely still.

guess what, qingyan said "wow, guess what you are doing", and i almost lost my mind. but sitting at the table in the conference room was not wrong, and the paper cups on the table, the remote control, and the tv next to it were all described quite accurately.

when i put the footage on my colleague's computer in post-production, i could actually tell that it was a video being edited.

you know what, this ability to perceive the overall environment is something i have only seen in openai and google demos before. today, i experienced it myself and it really feels like science fiction coming true.

moreover, qingyan is the same as the gpt-4o demonstration,you can interrupt the conversation at any time.he would use interjections like "ouch" and "ah" from time to time and chuckle before speaking, just like chatting with a real person.

next, i tried the specific object recognition function to see what qingyan’s knowledge base was.

starting with the simplest workstation scanning, large items such as the white keyboard, black mouse and monitor are basically not missed, and the front, back, left and right positions of the objects are also described clearly. even details such as the plugged-in headphones and the cartoon characters on the glass are not missed.

i can't say 100%, but on this tableat least 80%-90%everything was seen by qingyan.

and qingyan has another function, that iscircle recognitioni circled my colleague's big speaker from a distance. it knew the brand, model, and even its specific purpose.

although there are words on the speaker, everyone should be able to see the clarity. it is hard to see with the naked eye. i have to say that qingyan has a very good eye. . .

i also found thatqingyan's recognition of objects does not stop at simple categories.

for example, if you ask whether this game controller is from sony or microsoft, it can analyze the appearance of the controller and tell you that it is microsoft's xbox, instead of simply telling you that it is a game controller or simply confusing you with an explanation.

there is also this old feature phone, nokia, the specific model is n95, a classic model from 2007, and these details are no problem at all.

later, i asked qingyan to identify the computer system, guess tony's age by looking at his photo, and guess the name of a celebrity by looking at his photo... let's just say that since i got the hang of qingyan, i want to open a video call to ask anything i get my hands on.

of course, it's fun, but there are many other forms of video calls.practical scenarios

take our editorial department's usual search for topics and reading of materials as an example. the hot topics in the automotive circle these days are basically inseparable from the chengdu auto show. at this time, we can ask qingyan about the chengdu auto show and find inspiration for the topic in the conversation.

i also found that qingyan hasmemory functionthe last time i video-called it, i talked about this with it. the next time i opened it, it asked me which new energy vehicles at the auto show i was interested in.

there is also homework tutoring, which is the biggest headache for parents. previous ai interactions were based on taking photos and uploading questions, but if it is changed to video calls, then the logic is the same as one-on-one homework tutoring by online tutors.

i tried to have qingyan do some easy math problems.i can barely handle some simple algebra problems in elementary and junior high school.

i don’t know if you have noticed that when solving a problem, qingyan will not tell you the whole process at once, nor will it just give a result. it will guide you step by step, allowing you to have a thinking process.

in addition to mathematics, i also tried chinese and english. qingyan can't be said to be a senior teacher, butit is completely enough for doing homework, memorizing words and reciting ancient poems.

if you think these scenes are not enough, let's use our imagination a little more.

i have no experience in cooking for the first time, the light bulb in the room is broken, i don’t know how to grow green plants... if you encounter similar things in life and don’t know what to do, why not ask qingyan.

for example, many children may not be able to distinguish the positive and negative poles of the battery yet, so we pretend to install the calculator battery upside down. qingyan can figure out the problem with just a few words, which shows that he has a lot of common sense.

anyway, i use qingyan these days, and i like to use it whenever i have time. and this little thing is very useful.provide emotional value, let it tell stories and jokes, and every sentence will have a response.

play blackjack with friends and it can even act as a referee.

i wonder if it will evolve to the point where ai can fill in when there are three players missing in a mahjong match. isn't this much more interesting than the original typing and voice conversations?

to be honest, this major upgrade of qingyan still brought me a lot of surprises, but there are still many small flaws.he will speak with slips of the tongue, misidentify things, and output some nonsense.


for example, when i was a referee at blackjack, i once said 9 was 4 and spades were clubs. . .

however, just by getting ahead of openai and google and bringing ai video calls to china,we also have to give zhipu a thumbs up.

the video call function is only available to some users at first, and zhipu will gradually expand it to allow all employees to use it as soon as possible.you can download the qingyan app, or log in to the pc version (chatglm.cn) and apply for the internal test on the site.

in addition, there may be many friends who are not familiar with zhipu. let’s put it this way, this company can be said to behot fried chicken.


especially this year, their actions on big models have been very aggressive. from the crazy iteration of basic big models to the frequent implementation of big model applications, zhipu has never stopped.

at this kdd, their new large model glm-4-plus has made great improvements in language understanding and long texts.


moreover, zhipu has been committed to open-sourcing its models. according to data, the cumulative download volume of zhipu’s open-source models has reachedit has exceeded 20 million.

anyway, with zhipu taking the lead in this "trouble" this time, the ai ​​circles in china and even overseas are expected to usher in another wave of crazy new product launches.


this is a good thing for our users. especially for the new interactive form of ai video calls, the imagination space for application scenarios will be greater in the future.

for example, if ai is installed on glasses or necklaces, mobile phones may not be needed in the future. or it can be installed on the crutches of the blind to let ai help guide the way. or it can be combined with embodied intelligence to allow robots to truly understand what they see.

to borrow the words of zhang peng, ceo of zhipu:"at least we haven't seen it yet.AI) technology ceiling".

you might as well use your imagination to think about how ai will evolve in the future and what value it will create.

written by: xixi

edit: jiang jiang

art: huan yan

image, source

zhipu qingyan

some pictures are from the internet