A powerful tool for customizing Llama 3.1 models is here! NVIDIA builds a generative AI foundry and accelerates deployment

NVIDIA builds generative AI foundry and accelerates deployment

2024-07-24

Smart Things
AuthorZeR0
Edit Mo Ying

Zhidongxi reported on July 24 that NVIDIA announced the launch of the new NVIDIA AI Foundry service and NVIDIA NIM inference microservice, which together with the newly launched Meta Llama 3.1 series of open source models, provide strong support for generative AI for global enterprises.

The Llama 3.1 large language model is available in 8B, 70B, and 405B parameter sizes. The model was trained on more than 16,000 NVIDIA Tensor Core GPUs and optimized for NVIDIA accelerated computing and software (whether in the data center, cloud, or local workstations equipped with NVIDIA RTX GPUs or PCs equipped with GeForce RTX GPUs).

Just as TSMC is a foundry for global chip companies, NVIDIA has also built an enterprise-level AI foundry, NVIDIA AI Foundry.

“Meta’s open source Llama 3.1 model marks a critical moment in the adoption of generative AI by enterprises around the world,” said Jensen Huang, founder and CEO of NVIDIA. “Llama 3.1 will unleash a wave of companies and industries to create advanced generative AI applications. NVIDIA AI Foundry has integrated Llama 3.1 throughout the process and can help companies build and deploy custom Llama super models.”

Powered by the NVIDIA DGX Cloud AI platform and co-engineered with the world’s leading public clouds, NVIDIA AI Foundry provides an end-to-end service for quickly building custom super models. It is designed to provide enterprises with massive computing resources that can easily scale as AI needs change.

“With NVIDIA AI Foundry, companies can easily create and customize the most advanced AI services they want and deploy them through NVIDIA NIM,” said Mark Zuckerberg, founder and CEO of Meta.

Enterprises that need more training data to create domain-specific models can use their own data as well as synthetic data generated by Llama 3.1 405B and NVIDIA Nemotron Reward models to train these super models to improve accuracy. Customers with their own training data can use NVIDIA NeMo to customize Llama 3.1 models and further improve model accuracy through domain adaptive pre-training (DAPT).

NVIDIA and Meta are also working together to provide a distillation method for Llama 3.1 that allows developers to create smaller custom Llama 3.1 models for generative AI applications. This enables enterprises to run Llama-powered AI applications on more accelerated infrastructure, such as AI workstations and laptops.

Once custom models are created, enterprises can build NVIDIA NIM inference microservices to run them in production using their choice of best-of-breed machine learning operations (MLOps) and artificial intelligence operations (AIOps) platforms on their preferred cloud platforms and NVIDIA-certified systems from global server manufacturers.

The Nim microservice helps deploy Llama 3.1 models into production with up to 2.5 times higher throughput than when running inference without Nim.

Learn about the NVIDIA NIM inference microservice for Llama 3.1 models at ai.nvidia.com to accelerate deployment of Llama 3.1 models for production-grade AI.

Combining the Llama 3.1 NIM microservice with the new NVIDIA NeMo Retriever NIM microservice enables building advanced retrieval workflows for AI copilots, assistants and digital human avatars.

Using the new NVIDIA NeMo Retriever NIM inference microservice for Retrieval Augmentation Generation (RAG), enterprises can deploy custom Llama hypermodels and Llama NIM microservices into production to improve response accuracy.

When combined with the NVIDIA NIM Inference Microservice for Llama 3.1 405B, the NeMo Retriever NIM Microservice brings extremely high retrieval accuracy to open and commercial text question answering in RAG workflows.

NVIDIA AI Foundry combines NVIDIA software, infrastructure, and expertise with open community models, technologies, and support from the NVIDIA AI ecosystem. NVIDIA AI Enterprise experts and global system integrator partners work with AI Foundry customers to accelerate the entire process from development to deployment.

Professional services firm Accenture is the first to use NVIDIA AI Foundry, using the Accenture AI Refinery framework to create custom Llama 3.1 models for itself and for clients who want to deploy generative AI applications that reflect their culture, language and industry.

Enterprises in industries such as healthcare, energy, financial services, retail, transportation, and telecommunications are already using NVIDIA NIM microservices for Llama. Aramco, AT&T, and Uber are among the first companies to use the new NIM microservices for Llama 3.1.

Hundreds of NVIDIA NIM partners who provide enterprise, data and infrastructure platforms can now integrate these new microservices into their AI solutions, powering generative AI for the NVIDIA community of more than 5 million developers and 19,000 startups.

Production support for Llama 3.1 NIM and NeMo Retriever NIM microservices is available through NVIDIA AI Enterprise. NVIDIA Developer Program members will soon have free access to NIM microservices for research, development and testing on their preferred infrastructure.

news

NVIDIA builds generative AI foundry and accelerates deployment

Introduction

my contact information