news

Nvidia's "AI Box" upgrade releases software and services, and Huang Renxun creates a digital copy of the physical world

2024-07-30

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Text/Li Haidan, Tencent Technology Editor/Guo Xiaojing

On July 30, Beijing time, NVIDIA (also known as NVIDIA) demonstrated a number of latest advances in rendering, simulation and generative AI at SIGGRAPH 2024, a top computer graphics conference held in Denver, USA.

At last year's SIGGRAPH, NVIDIA launched the GH200, L40S graphics cards, and ChatUSD. This year's protagonist is NVIDIA's new trump card in the era of generative AI -Nvidia NIM is now available, and through NIM, generative AI is applied to USD (Universal Scene Description), broadening the possibilities of AI in the 3D world.

Nvidia NIM upgrade: both a blessing and a challenge

Nvidia announced that Nvidia NIM has achieved further optimization and standardization of complex deployment of AI models. NIM is a key link in Nvidia's AI layout. Huang Renxun has repeatedly praised the innovation brought by NIM, calling it "AI-in-a-Box, essentially it is artificial intelligence in a box。”

This upgrade undoubtedly consolidates Nvidia's leadership in the field of AI and becomes an important part of its technological moat.

CUDA has long been considered a key factor in NVIDIA's leadership in the GPU field. With the support of CUDA, GPUs have evolved from single graphics processors to general-purpose parallel computing devices, making AI development possible. However, despite NVIDIA's rich software ecosystem, these decentralized systems are still too complex and difficult to master for traditional industries that lack basic AI development capabilities.

To solve this problem, in March this year, Nvidia launched NIM (Nvidia Inference Microservices) cloud-native microservices at the GTC conference, integrating all the software developed in the past few years to simplify and accelerate the deployment of AI applications. NIM can use models as optimized "containers" that can be deployed in the cloud, data centers, or workstations, allowing developers to complete their work in a few minutes, such as easily building generative AI applications for co-pilots, chatbots, etc.

So far, Nvidia's NIM ecosystem has provided a series of pre-trained AI models., helping developers accelerate application development and deployment in multiple areas, with a focus on different areas (such as understanding,Digital Human, 3D development, robotics and digital biology):

In terms of understanding, NIM can use Llama 3.1 and NeMo Retriever to improve the processing capabilities of text data. In terms of digital humans, it provides models such as Parakeet ASR and FastPitch HiFiGAN, which support high-fidelity speech synthesis and automatic speech recognition, providing powerful tools for building virtual assistants and digital humans.

In terms of 3D development, models such as USD Code and USD Search simplify the creation and operation of 3D scenes, helping developers build digital twins and virtual worlds more efficiently;

In the field of robot embodiment, NVIDIA launched the MimicGen and Robocasa models.Accelerates the development and application of robotics by generating synthetic motion data and simulated environments. MimicGen NIM generates synthetic motion data based on teleoperation data recorded by spatial computing devices such as Apple Vision Pro. Robocasa NIM generates robotic tasks and simulation-ready environments in OpenUSD, a common framework for developing and collaborating in 3D worlds.

Models such as DiffDock and ESMFold in the field of digital biology provide advanced solutions in drug discovery and protein folding prediction, promoting the progress of biomedical research, etc.

In addition, Nvidia announced that the Hugging Face inference-as-a-service platform is also powered by Nvidia NIM and runs in the cloud.

By integrating these versatile models, Nvidia's ecosystem not only improves the efficiency of AI development, but also provides innovative tools and solutions. However, although the many upgrades of Nvidia NIM are indeed a great "good news" for the industry, on the other hand, it also brings many challenges to programmers.

Nvidia NIM greatly simplifies the development and deployment process of AI models by providing pre-trained AI models and standardized APIs. This is indeed a great boon for developers, but does it also mean that employment opportunities for ordinary programmers will further shrink in the future?After all, enterprises can complete the same work with fewer technical personnel, because these tasks have been pre-completed by NIM, and ordinary programmers may no longer need to carry out complex model training and tuning work.

Teach AI to think in 3D and build a virtual physical world

NVIDIA also demonstrated the application of generative AI on the open USD and Omniverse platforms at the SIGGRAPH conference.

Nvidia announced that it has built the world’s first generative AI model that can understand OpenUSD (Universal Scene Description) language, geometry, materials, physics, and space, and packaged these models as Nvidia NIM microservices.Currently, there are three NIMs available for preview in the Nvidia API catalog: USD Code, which answers OpenUSD knowledge questions and generates OpenUSD Python code; USD Search, which allows developers to search the massive OpenUSD 3D and image database using natural language or image input; and USD Validate, which checks uploaded files for compatibility with OpenUSD releases and generates fully RTX-rendered path-traced images using the Omniverse cloud API.

Nvidia said that with the enhancement and accessibility of Nvidia NIM microservices to OpenUSD, all walks of life will be able to build physically based virtual worlds and digital twins in the future. Through new generative AI and Nvidia accelerated development frameworks based on open USD, which are built on the Nvidia Omniverse platform, more industries can now develop applications for visualizing industrial design and engineering projects, as well as for simulating environments to build the next wave of physical AI and robots. In addition, the new USD connector connects robotics and industrial simulation data formats and developer tools, enabling users to stream large-scale, fully Nvidia RTX ray-traced data sets to Apple Vision Pro.

In short, the introduction of USD through Nvidia NIM is a valuable asset for better understanding the physical world and building virtual worlds through large models.Digital Assets. For example, in 2019, Notre Dame de Paris, France, suffered a serious fire and a large area of ​​the church was destroyed. Fortunately, Ubisoft game designers have visited this building countless times, studied its structure, and completed the digital restoration of Notre Dame de Paris. In the 3A game "Assassin's Creed: Unity", all the details of Notre Dame de Paris were reproduced, which also greatly helped the restoration of Notre Dame de Paris. At that time, designers and historians took two years to reproduce, but with the introduction of this technology, we can speed up the reproduction of digital copies on a large scale in the future, and use AI to understand and reproduce the physical world in a more refined way.

For example, designers build basic 3D scenes in Omniverse and use them to adjust generative AI to achieve a controllable and collaborative content creation process. For example, WPP and Coca-Cola are the first to adopt this workflow to expand their global advertising campaigns.

Nvidia also announced the upcoming launch of several new NIM microservices, including USD Layout, USD Smart Material, and FDB Mesh Generation, to further enhance the application capabilities and efficiency of developers on the open USD platform.

NVIDIA Research presented more than 20 papers at the conference, sharing innovative results related to the development of synthetic data generators and inverse rendering tools, two of which won the Best Technical Paper Award.AI makes simulations better by improving image quality and unlocking new 3D representations; At the same time, improved synthetic data generators and more also advance the level of AI. These studies demonstrate Nvidia’s latest progress and innovation in the fields of AI and simulation.

Nvidia said that designers and artists now have new and improved ways to increase productivity by using generative AI trained on licensed data. For example, Shutterstock (a US picture supplier) has launched a commercial beta of its generative 3D service. It only uses text or image prompts to enable creators to quickly prototype 3D assets and generate 360 ​​HDRi backgrounds to illuminate the scene; and Getty Images (a US picture trading company) has accelerated its generative AI service, doubling the speed of image generation and improving output quality. These services are based on the multimodal generative AI architecture Nvidia Edify, which doubles the speed of the new model, improves image quality and prompt accuracy, and enables users to control camera settings such as depth of field or focal length. Users can generate four images in about six seconds and enlarge them to 4K resolution.

Conclusion

Whenever Huang Renxun appears on major occasions, he always wears a leather jacket and describes to the world the exciting future brought by AI.

We are also experiencing NVIDIA's growth, witnessing NVIDIA's step-by-step transformation from a gaming GPU giant to an AI chip overlord, and then to a full-stack layout of AI software and hardware. NVIDIA is full of ambition and is rapidly iterating at the forefront of the AI ​​technology wave.

From programmable shading GPUs and CUDA accelerated computing, to the launch of Nvidia Omniverse and generative AI NIM microservices, to the promotion of 3D modeling, robotic simulation and digital twin technology, it also means that a new round of innovation in the AI ​​industry is coming.

However, as large companies have more resources, including capital, technology, and manpower, they are able to adopt and implement advanced technologies such as Nvidia NIM more quickly. However, small and medium-sized enterprises may find it difficult to keep up with the pace of technological development due to limited resources. Coupled with the different levels of talent and technology, will this lead to more technological inequality in the future?

The ideal AI for humans is to help humans free their hands and labor, and bring humans a world with higher productivity. But when productivity and means of production are controlled by a small number of people, will it lead to a deeper crisis? These are all questions we need to think about.