Search

Showing top 130 results for "LLMs"

LLMs

Large language models are machine learning models trained to predict and generate text and other language-based outputs.

373 articles indexed Last updated just now See topic hub

NVIDIA Nemotron 3 Nano Omni: 단일 오픈 모델로 멀티모달 에이전트 추론을 가속화

…NVIDIA TensorRT LLM Cookbook : 프로덕션 등급 저지연 배포를 위해 잠재 MoE 커널까지 풀 최적화된 TensorRT LLM 엔진. Dynamo 배포 레시피: 분리(disaggregated) 서빙, 지능형 라우팅, 다계층 KV 캐싱, 멀티모달 NVIDIA Nemotron 3…

May 12, 2026 · Anjali Shah

How to Integrate Computer Vision Pipelines with Generative AI and Reasoning | NVIDIA Technical Blog

…This enables more actionable insights. The NVIDIA Blueprint for video search and summarization (VSS) brings together vision language models (VLMs) , large language models (LLMs) , and retrieval-augmented generation (RAG) with optimized ingestion…

Sep 25, 2025 · Samuel Ochoa

NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model | NVIDIA Technical Blog

…NVIDIA TensorRT LLM Cookbook : Fully optimized TensorRT LLM engines with latent MoE kernels for production-grade, low-latency deployment. Dynamo deployment recipes: Disaggregated serving, intelligent routing, multi-tier KV caching, and automatic…

Apr 28, 2026 · Anjali Shah

모델 양자화: NVIDIA Model Optimizer로 구현하는 학습 후 양자화(PTQ)

…비전 인코더는 LLaVA 같은 멀티모달 LLM과 OWL-ViT 같은 오픈 보캐뷸러리(open-vocabulary) 인지 모델의 시각 백본 역할을 담당합니다. OpenCLIP, SigLIP 같은 후속 모델은 데이터 규모를 확장하고 학습 목표를 정교화했지만, 듀얼 인코더…

May 20, 2026 · Ruixiang Wang

Enhancing Distributed Inference Performance with the NVIDIA Inference Transfer Library | NVIDIA Technical Blog

…Learn more Deploying large language models (LLMs) requires large-scale distributed inference , which spreads model computation and request handling across many GPUs and nodes to scale to more users while reducing latency…

Mar 9, 2026 · Seonghee Lee

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell | NVIDIA Technical Blog

…The NVFP4 training recipe for JAX (as implemented in MaxText) preserves convergence in large-scale LLM training through five core techniques: 16-element micro block scaling, E4M3 block scale factors under a…

Jun 8, 2026 · Max Xu

Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI | NVIDIA Technical Blog

…The model supports deployment via open-source frameworks like SGLang, NVIDIA TensorRT-LLM, and vLLM, leveraging NVIDIA-accelerated infrastructure and GPU-accelerated endpoints for prototyping and evaluation. NVIDIA NIM enables production-ready…

May 29, 2026 · Anu Srivastava

Using NVFP4 Low-Precision Model Training for Higher Throughput Without Losing Accuracy | NVIDIA Technical Blog

…His past work includes 4-bit and 8-bit LLM pretraining, quantization-aware training and distillation, and sparse attention mechanisms, enabling more efficient long-context and large-scale transformer models. Prior to…

Feb 23, 2026 · Aditya Vavre

Mastering Agentic Techniques: AI Agent Customization | NVIDIA Technical Blog

…Instead of hand-authoring every training example, teams can define a data schema and use LLMs to generate diverse, high-quality training pairs. Then, conduct SFT using that generated dataset using an…

May 20, 2026 · Edward Li

NVIDIA Nsight Copilot

…See Nsight Copilot in Action Combined with our specialized CUDA-aware LLM inference models, Nsight Copilot delivers the best coding experience for CUDA developers. Nsight Copilot is powered by NVIDIA NIM™ microservices…

Followed topics