Search

Showing top 10 results for "AI model rollout"

How to Post-Train Autonomous Vehicle Models in Closed-Loop with NVIDIA Alpamayo | NVIDIA Technical Blog

… This makes the AlpaGym post-trained policy runnable inside AlpaSim for closed-loop rollouts. ... model: model type: alpamayo1 checkpoint path: "/root/.cache/huggingface/alpasim models/alpamayo1 CLRL/step NNNNNN" device: "cuda" ... …

Jun 1, 2026 · Boris Ivanovic

Run High-Throughput Reinforcement Learning Training with End-to-End FP8 Precision | NVIDIA Technical Blog

… Summary of results for FP8 on KV cache and attention We ran results on the Qwen3-8B-Base model using the GRPO algorithm, with FP8 applied in rollout and BF16 for training. …

Apr 20, 2026 · Guyue Huang

Building Autonomous Vehicles That Reason with NVIDIA Alpamayo | NVIDIA Technical Blog

… Create a new configuration file for your model some examples can be found below driver configs/my model.yaml @package global services: driver: image: command: - " " And run: uv run alpasim wizard deploy=local topology=1gpu driver= wizard.log dir=$PWD/tutorial Examples of customization using the CLI… …

Jan 5, 2026 · Marco Pavone

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical Blog

… Model weights Full parameter checkpoints for Nemotron 3 Super are available on Hugging Face and through NVIDIA NIM . The NVIDIA Nemotron Open Model License gives enterprises the flexibility to maintain data control and deploy anywhere. …

Mar 11, 2026 · Chris Alexiuk

How to Eliminate Pipeline Friction in AI Model Serving | NVIDIA Technical Blog

Agentic AI / Generative AI How to Eliminate Pipeline Friction in AI Model Serving May 12, 2026 By Lovina Dmello Discuss 0 Discuss 0 L T F R E AI-Generated Summary Like Dislike Pipeline friction in AI model serving arises from issues like model export problems, unsupported operations, dynamic input … …

May 12, 2026 · Lovina Dmello

Deploying Disaggregated LLM Inference Workloads on Kubernetes | NVIDIA Technical Blog

… It expresses all roles in a single PodCliqueSet: apiVersion: grove.io/v1alpha1 kind: PodCliqueSet metadata: name: inference-disaggregated spec: replicas: 1 template: cliqueStartupType: CliqueStartupTypeExplicit terminationDelay: 30s cliques: - name: router spec: roleName: router replicas: 2 podSpec… …

Mar 23, 2026 · Anish Maddipoti

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters | NVIDIA Technical Blog

… Over-provisioning: Engineers request entire GPUs to avoid contention, but models frequently use 30-50% of available memory and compute. …

May 21, 2026 · Guy Saltoun

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model | NVIDIA Technical Blog

… Explore the model catalog and use Nemotron models directly in your Azure environment . Inference service providers such as Baseten , Canonical , Clarifai , DeepInfra , Eigen AI , fal.AI, FriendliAI , and Fireworks AI . …

Apr 28, 2026 · Anjali Shah

Building Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo | NVIDIA Technical Blog

Mar 1, 2026 · Aiden Chang

Achieving Single-Digit Microsecond Latency Inference for Capital Markets | NVIDIA Technical Blog

… LSTM A and p99 latency: 4.70 microseconds with one model instance 4.67 microseconds with two model instances 4.61 microseconds with four model instances 4.67 microseconds with eight model instances LSTM B and p99 latency: 7.10 microseconds with one model instance 6.88 microseconds with two model in… …

Apr 2, 2026 · Nikolay Markovskiy

Followed topics