news

the new book "understanding real-time interaction" released by agora records the past, present and future of rte

2024-09-06

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

as an infrastructure for future digital life, rte real-time interaction has penetrated into people's social, entertainment, work, shopping and other aspects, and has leveraged the value growth of all walks of life. even in the current aigc boom, real-time interaction is playing an important role, promoting the interaction between people and ai from text to multi-modality of audio and video.
but what exactly is real-time interaction? how did it evolve from rtc real-time audio and video? what are the technical principles behind it? how many application scenarios has rte been implemented so far? what technical difficulties are there in realizing real-time voice conversations in large models in the future?
on august 27, it was published by the machinery industry press.agora research institutethe real-time interaction industry book "understanding real-time interaction" compiled by the group was officially released and put on the shelves. this is also the first technical popular science book in the industry that systematically introduces real-time interaction.
it is jointly recommended by jixun foo, senior managing partner of granite asia, jiang tao, founder and chairman of csdn, liu qin, founding partner of 5y capital, liu chengcheng, founder and chairman of 36kr, li donghong, professor of tsinghua university, ma siwei, professor of peking university, xie lei, professor of northwestern polytechnical university, and wu lianfeng, vice president and chief analyst of idc china.
to understand real-time interaction, just read this book.
the book "understanding real-time interaction" introduces in detail the past, present and future of the development of real-time interaction.covers the development history, concept analysis, technical principles, application scenarios, and big data observation of real-time interactionit is mainly divided into five parts.
·   chapter 1: review and insight into the past, present and future of real-time interaction
from the birth of the world's first voip phone that can make calls in 1996 to the current real-time voice conversations between humans and ai, what kind of scene evolution, technology upgrades, and corporate innovations have occurred in the meantime? from werewolf, live broadcasts, online education, interactive podcasts, the metaverse, to this year's popular aigc, agora analyzes the technology upgrades and industry changes of real-time interaction from the perspective of an underlying audio and video service provider.
for example, how did real-time interaction become a standard feature of pan-entertainment applications? with the support of rtc functions, how can iot devices achieve the interaction of all things? from the first appearance of the concept of video conferencing in 1964 to the current situation where developers can embed video conferencing functions in any app, how did video conferencing upgrade from a single communication tool to a universal capability.
·   chapter 2: real-time interaction and analysis of related concepts
we always talk about rtc and rte, but do you really understand the difference between them? wait, there are also paas, saas, iaas, apaas. are you still racking your brains to distinguish these concepts?
i believe that after reading this chapter, you will have a clearer and more definite understanding of these concepts.
·   chapter 3: analysis of real-time audio and video technology process
now comes the highlight. as a beginner in the industry, this chapter is your best choice to learn about real-time audio and video technology. from audio and video acquisition, pre-processing, encoding, transmission, to audio and video post-processing, we have worked with many technical experts from soundnet to describe them one by one in a long and detailed article.
in this chapter, you can see the best practices for audio 3a processing and beautiful voice in audio and video pre-processing; how to achieve beauty in scenarios such as live broadcasts and video calls; what are the differences in the application of different video codec standards such as vp8, vp9, ​​h.264, h.265 and av1 in rtc? and so on.
·   chapter 4: in-depthanalysis200 real-time interactive application scenarios
at present, the aigc industry is developing in full swing. with the support of real-time interaction, the interaction between people and ai has also been upgraded from text to multi-modal audio and video. many conversational ai scenarios have also emerged, such as ai voice assistants, ai oral teachers, ai game npcs, ai virtual lovers, etc. taking ai voice assistants as an example, through multi-modal large models + rtc, users can have 1v1 real-time voice conversations with ai assistants. through prompt (ai instructions or ai prompt words) to set rich personalities for the assistant, combined with rtc's ultra-low latency transmission, ai can interact and provide help like a real assistant.
in this type of conversational ai scenario, there are many technical difficulties. for example, ai voice conversations usually have a high delay, which greatly affects the conversation experience and puts a great test on the low latency of rtc transmission. usually, the delay must be within 1-2 seconds for the human-computer conversation experience to be natural and smooth. at the same time, in streaming conversations, the surrounding human voices and noise can easily interfere with the human-computer conversation. it is necessary to use audio functions such as aivad, agc, and ains to effectively suppress the interference of surrounding noise and better recognize the complete semantics of the user's speech, making speech recognition more complete and accurate.
in addition to the newly emerging conversational ai scenarios and the familiar live show, voice chat room, online karaoke and other scenarios, the book also introduces many iot scenarios that you may not have known, such as cloud racing in the field of parallel control. cloud racing is based on cloud computing and artificial intelligence and is mainly used for remote driving racing by enthusiasts and professional racers. through the control of the cloud platform, drivers can participate in remote racing at home.
in this type of scenario, the technical difficulties mainly focus on delay and transmission. for example, the racing car is fast and has high requirements for delay. it is necessary to meet the requirements of low latency and transmission stability of real-time images, and it is also necessary to have the technical capability of multi-channel high-definition return transmission.
·   chapter 5: real-time interactive big data observation
are users all over the world interacting in real time? as an app operator or a colleague who pays attention to this industry, i believe that you are concerned about the real-time audio and video big data, such as the list of rtc usage models in popular regions around the world, the proportion of commonly used networks for models, the proportion of audio and video usage by different models, rtc usage in popular regions, etc. in addition, what is the impact of audio and video freeze rate on user usage time and retention? how to optimize it? these contents are not to be missed for an rtc industry practitioner.
this book also prepares rich and practical supporting resources, including electronic documents such as charts, maps, industry development reports and white papers related to real-time interaction. it is absolutely full of useful information and worth the money. the specific way to obtain it is to scan the qr code on the back cover to enter the book's exclusive cloud disk for downloading.
where does “understanding real-time interaction” come from?
against the backdrop of the rapid evolution of the real-time interactive industry environment and the diversification of usage scenarios, agora found that there is no book in the industry that systematically introduces real-time interaction from the perspectives of development history, application scenarios, and technical architecture. only a few books introduce webrtc from the technical architecture level. many people do not know the relationship and difference between rtc real-time audio and video and rte real-time interaction. therefore, it is particularly important to fully understand and popularize real-time interaction at the moment.
this year marks the 10th anniversary of the establishment of soundnet. as the pioneer of the global real-time interactive cloud industry, soundnet has always beenhelp people to interact in real time across distance, like gathering togetherwith the mission of "audionet", we are committed to comprehensively improving people's real-time interactive experience through high-quality real-time audio and video technology services, empowering social, education, finance, medical and other industries, and promoting economic and social development. soundnet has the responsibility and obligation to promote the popularization of real-time interaction.
in response to this, soundnet launched the book "understanding real-time interaction: in-depth interpretation of audio and video technology, scenarios and data" on its 10th anniversary.development history, technical principles, application scenarios, big data observation and other dimensionsa comprehensive and systematic explanation of real-time interaction. we hope that readers can deeply understand real-time interaction and master the relevant knowledge of real-time interaction by reading this book, and encourage more people to join this industry and promote the progress of the industry together.
at present,"understanding real-time interaction" has been launched on major e-commerce platforms such as jd.com and dangdang, as well as physical bookstores. please stay tuned.
report/feedback