Search

Showing top 78 results for "AI training practices"

People also ask

Can low-precision training match BF16 accuracy at scale?

To validate the practical impact of low-precision training for real-world large-model pretraining, the team evaluated both the training convergence and downstream task performance across two widely used dense transformer architectures: Llama 3 8B and an NVIDIA internal research 8B model (Research-8B with dense grouped query attention (GQA) architecture that is similar to Llama 3 8B). The models were trained on 1 trillion tokens.

Using NVFP4 Low-Precision Model Training for Higher Throughput Without Losing Accuracy | NVIDIA Technical Blog

What is pipeline friction in AI model serving?

Pipeline friction refers to any obstacle that slows or disrupts the journey of a model from training to production inference. Unlike bugs that produce clear error messages, friction often manifests as subtle inefficiencies: a model that consumes twice the expected GPU memory, for example, or an inference server that drops requests under load, or a deployment that works on one GPU architecture but fails on another. The most frequent sources of pipeline friction can be grouped into four categories: Model export issues: These arise when converting from training frameworks like PyTorch or TensorFl

How to Eliminate Pipeline Friction in AI Model Serving | NVIDIA Technical Blog

What is model pruning?

Pruning is a model optimization technique that leverages the common over-parameterization of neural networks occurring from training models with enough capacity to learn complex features and ensure smooth convergence. Pruning systematically identifies and removes unimportant parameters such as weights, neurons, or even layers from a trained model. This process can often eliminate large amounts of a model’s weights with minimal impact on accuracy, directly translating to a more compact model with accelerated inference speeds and lower computational cost. Similar to how an arborist trims a tree

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog

What is response-based knowledge distillation?

Response-based knowledge distillation transfers a teacher model’s knowledge to a student by training the student to match the teacher’s soft output probabilities rather than only hard labels. These soft targets convey inter-class similarities, for example that “cat” is closer to “tiger” than to “car,” and the student is optimized to align with them using KL divergence. The approach is simple to implement, requires no access to the teacher’s internal features, and is highly effective for classification tasks. In practice, it’s common to combine the distillation loss with standard cross-entropy

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog

How to Eliminate Pipeline Friction in AI Model Serving | NVIDIA Technical Blog

… Foster communication between training and deployment teams. Many friction sources originate from architectural decisions during training that have unintended deployment consequences. …

May 12, 2026 · Lovina Dmello

Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE | NVIDIA Technical Blog

… We show the local training code on the left and the federated version on the right, highlighting: import, flare.init , receive , send . train.py train.py import torch import torchvision import torchvision.transforms as transforms from model import Net batch size = 4 epochs = 1 lr = 0.01 model = Net… …

Apr 24, 2026 · Holger Roth

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog

… In practice, we recommend multinode training for faster training. torchrun --nproc per node 8 /opt/NeMo/scripts/llm/gpt train.py \ --name Qwen3-8B-nemo-width-pruned-distill \ --devices 8 \ --num nodes 1 \ --tp size 8 \ --model path Qwen3-8B-nemo-width-pruned \ --teacher path Qwen3-8B-nemo \ --legac… …

Oct 7, 2025 · Max Xu

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical Blog

… End-to-end training and evaluation recipe s We are releasing the complete training and evaluation recipe for Nemotron 3 Super, covering the full pipeline from pretraining through alignment. …

Mar 11, 2026 · Chris Alexiuk

Using NVFP4 Low-Precision Model Training for Higher Throughput Without Losing Accuracy | NVIDIA Technical Blog

… This post compares the following three low-precision training formats directly against established BF16 precision training across multi-hundred-billion token pretraining runs and downstream benchmarks: 8-bit floating point per-tensor current scaling FP8-CS Mixed precision training with FP8 MXFP8 NV… …

Feb 23, 2026 · Aditya Vavre

NVIDIA Ising — AI for Quantum Computing

… NVIDIA Ising Calibration and Ising Decoding NVIDIA Ising Decoding includes a training framework for decoder models so you can train your own decoders and custom tailor them to their quantum computer noise models for the best performance. …

Mastering Agentic Techniques: AI Agent Evaluation | NVIDIA Technical Blog

… To learn more, watch the related GTC 2026 session and training lab on demand: Evaluation-Driven Development: Best Practices for Building Reliable Agents GTC session Develop Production Agents with Eval-Driven Design GTC training lab Discuss 0 Discuss 0 Tags Agentic AI / Generative AI | Developer Too… …

May 19, 2026 · Edward Li

How to Post-Train Autonomous Vehicle Models in Closed-Loop with NVIDIA Alpamayo | NVIDIA Technical Blog

… Specifically, it walks through how to: Install and configure AlpaGym Define closed-loop rewards Launch closed-loop training Export the post-trained checkpoint for downstream use Closed-loop post-training with AlpaGym extends AV training workflows by turning AlpaSim rollouts into training experience. …

Jun 1, 2026 · Boris Ivanovic

Mastering Agentic Techniques: AI Agent Customization | NVIDIA Technical Blog

… When to use Working with accessible data for well-defined tasks with output examples Customizing a model for a low-resource domain where labelled examples are limited, and high-quality synthetic data can be generated to bootstrap the fine-tuning dataset Requiring the model to reliably produce speci… …

May 20, 2026 · Edward Li

Metropolis for Developers

… Recipe: Generating Synthetic Data for Multi-View Warehouse Detection and Tracking With Cosmos Transfer Recipe: CARLA Simulator-to-Real Augmentation for Traffic Anomaly Scenarios With Cosmos Transfer Recipe: Style-Guided Video Generation With Cosmos Transfer 2.5 Recipe: LoRA Post-Training for Sports… …

Followed topics

People also ask

How to Eliminate Pipeline Friction in AI Model Serving | NVIDIA Technical Blog

Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE | NVIDIA Technical Blog

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical Blog

Using NVFP4 Low-Precision Model Training for Higher Throughput Without Losing Accuracy | NVIDIA Technical Blog

NVIDIA Ising — AI for Quantum Computing

Mastering Agentic Techniques: AI Agent Evaluation | NVIDIA Technical Blog

How to Post-Train Autonomous Vehicle Models in Closed-Loop with NVIDIA Alpamayo | NVIDIA Technical Blog

Mastering Agentic Techniques: AI Agent Customization | NVIDIA Technical Blog

Metropolis for Developers