the first domestically produced large model with eyes debuted at the china international fair for trade in services

the first domestically produced large model with eyes was unveiled at the china international fair for trade in services

2024-09-13

the domestic large model successfully unlocked the "eyes". zhipu ai (beijing zhipu huazhang technology co., ltd.) demonstrated the newly released qingyan app "video call" function at the 2024 china international fair for trade in services (hereinafter referred to as the services trade fair) which opened on september 12. it is reported that this is the world's first large model video call function open to the general public.

according to reports, qingyan's video call function spans three major modes: text, audio, and video, and has real-time reasoning capabilities. users can turn on the camera and communicate with qingyan through the video call window. qingyan can not only "see" the user's screen, but also understand the instructions and execute them accurately, and can respond quickly even if they are frequently interrupted. when you highlight the key points on the screen, qingyan can also understand where the user wants it to focus.

provide homework help.

zhipu ai said that gpt-4o previously launched voice but did not open the video function to the public. with qingyan, you can experience the most cutting-edge ai/large model technology. in the past, the interaction with ai was mainly in the form of text. with the video function, users can say goodbye to long text prompts and communicate with ai smoothly. qingyan is like a human assistant who has good eyesight and can understand words around the user. as long as the camera is taken, ai can know the user's environment and what they want ai to do. the user only needs to give verbal instructions.

the domestically produced large model successfully unlocked the "eyes".

the qingyan video call function can also be transformed into a portable english translator, which can instantly translate chinese and english, conduct english conversations based on the user's environment information, and assist in correcting the user's voice and grammatical errors. the use scenarios of this function include explaining the stories of scenic spots at any time during travel, identifying the environment for the visually impaired, etc. it can also provide subject homework guidance, interview guidance, complete minutes and summaries in meetings, analyze complex data charts, and interpret the code on the computer screen in real time.

at this service trade fair, zhipu ai will showcase a number of its latest products. in addition to the video call function of the qingyan app, it also includes the ai-generated video function. zhipu launched the video generation model cogvideox earlier this year, and launched the "qingying" function with video generation on the zhipu qingyan app. qingying is open to all users. just enter text or upload a picture to generate a 6-second video within 30 seconds. in july this year, qingying was launched on the qingyan app, and users generated more than 1 million videos in six days.

the paper reporter zhang jing

(this article is from the paper. for more original information, please download the "the paper" app)

report/feedback

news

the first domestically produced large model with eyes was unveiled at the china international fair for trade in services

introduction

my contact information