Search

Showing top 82 results for "GPU needs for LLMs"

Mastering Agentic Techniques: AI Agent Customization | NVIDIA Technical Blog

…annotators, an LLM judge, rule-based verifiers, or synthetically generated preference data, since DPO is agnostic to the source of the preference signal. Preference signals eliminate the need for a separate reward…

May 20, 2026 · Edward Li

Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety | NVIDIA Technical Blog

…Nemotron 3 Super employs a hybrid Mamba-Transformer MoE architecture with NVFP4 precision on Blackwell GPUs, achieving high throughput and efficiency for multi-agent tasks, while Nemotron 3 Content Safety delivers low…

Mar 24, 2026 · Chintan Patel

Implementing Falcon-H1 Hybrid Architecture in NVIDIA Megatron Core | NVIDIA Technical Blog

…language model (LLM) development, NVIDIA Megatron Core has emerged as the foundational framework for training massive transformer models at scale. The open source library offers industry-leading parallelism and GPU-optimized performance…

Mar 9, 2026 · Mireille Fares

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical Blog

…High-throughput continuous batching and streaming for Super. SGLang Cookbook : Fast, lightweight inference optimized for multi-agent tool-calling workloads. NVIDIA TensorRT LLM Cookbook : Fully optimized TensorRT LLM engines with latent MoE…

Mar 11, 2026 · Chris Alexiuk

How to Minimize Game Runtime Inference Costs with Coding Agents | NVIDIA Technical Blog

…GPU between graphics and compute. Code agents: Trapping the ghost Andrej Karpathy, a founding member of OpenAI, likens working with large language models (LLMs) to summoning ghosts , an apt metaphor for LLM…

Mar 3, 2026 · Brandon Rowlett

How to Build License-Compliant Synthetic Data Pipelines for AI Model Distillation | NVIDIA Technical Blog

…It details how to build reproducible, structured product Q&A datasets by combining controlled sampling, LLM-based generation, and automated LLM-as-a-judge quality scoring, ensuring datasets are ready for distillation…

Feb 5, 2026 · Alex Steiner

How NVIDIA Extreme Hardware-Software Co-Design Delivered a Large Inference Boost for Sarvam AI’s Sovereign Models | NVIDIA Technical Blog

…Furthermore, we utilized a ReplicatedLinear block for the router logits. Since the router weights are small, replicating them across GPUs eliminates the need for expensive communication during the gating phase, keeping the…

Feb 18, 2026 · Utkarsh Uppal

Followed topics

Search

People also ask