Search

Showing top 98 results for "NVFP4"

…네이티브 NVFP4 사전 학습 : NVIDIA Blackwell에 최적화되어, 메모리 요구사항은 크게 낮추면서 NVIDIA B200 추론 속도를 NVIDIA H100의 FP8 대비 4배까지 끌어올리고, 정확도도 함께 유지합니다. 다환경 강화 학습(RL) : NVIDIA NeMo Gym 과…

May 14, 2026 · Chris Alexiuk

Bringing AI Closer to the Edge and On-Device with Gemma 4 | NVIDIA Technical Blog

…From Blackwell, with NVFP4 quantized checkpoints coming soon, to Jetson platforms, developers can quickly get started deploying these high-accuracy multimodal models, with the flexibility to meet their speed, security, and cost…

Apr 2, 2026 · Anu Srivastava

Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety | NVIDIA Technical Blog

…Nemotron 3 Super employs a hybrid Mamba-Transformer MoE architecture with NVFP4 precision on Blackwell GPUs, achieving high throughput and efficiency for multi-agent tasks, while Nemotron 3 Content Safety delivers low…

Mar 24, 2026 · Chintan Patel

NVIDIA Levels Up Local AI Agents Across RTX PCs and DGX Spark

…The updates deliver 2.6x performance on DGX Spark compared with the previously available NVFP4 checkpoints from Unsloth, and include kernel improvements as well as mixed precision, and CUDA Graph support for…

Jun 1, 2026 · Gerardo Delgado

Discussions and forums

r/LocalLLaMA · u/LLMFan46 · 2w ago

Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats

Safetensors, llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved: https://huggingface.co/llmfan46/Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved GGUFs, llmfan46/Qwen3.5-35B-A3B-uncensored-here…

r/LocalLLaMA · u/Kurcide · May 1, 2026

16x Spark Cluster (Build Update)

Build is done. 16 DGX Sparks on the fabric, all hitting line rate. Setup was time consuming but honestly smoother than I expected. Each Spark runs Nvidia’s flavor of Ubuntu out of the box with mostly everything pre insta…

r/nvidia · u/Kurcide · May 1, 2026

My 16x DGX Spark Cluster (HomeLab)

Added a 16x Spark Cluster to my homelab over the last few days. Curious if this is the largest Spark cluster anyone has built. About 2 years ago I had renovated my basement and built a personal lab/datacenter into my off…

r/homelab · u/Kurcide · May 1, 2026

Added a 16x DGX Spark cluster to my Homelab (Build Update)

Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo | NVIDIA Technical Blog

…Harness-facing Dynamo settings Our experiments used the newly released nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 model, though the same issues apply across models, reasoning parsers, and tool-call parsers…

May 8, 2026 · Matej Kosec

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical Blog

…Native NVFP4 pretraining optimized for NVIDIA Blackwell, significantly cutting memory requirements and speeding up inference by 4x on NVIDIA B200 compared to FP8 on NVIDIA H100, while maintaining accuracy. Multi-environment reinforcement…

Mar 11, 2026 · Chris Alexiuk

Data Center Deep Learning Product Performance Hub

…This is enabled by deep co-design across NVIDIA Blackwell, NVLink™, and NVLink Switch for scale-out; NVFP4 for low-precision accuracy; and NVIDIA Dynamo and TensorRT™ LLM for speed and flexibility…

Hermes Unlocks Self-Improving AI Agents, Powered by NVIDIA RTX PCs and DGX Spark

…Pair the NVFP4 checkpoints with Google’s new Multi-Token Prediction drafters to get up to 3x faster inference at identical output quality, enabling frontier-class reasoning to run locally on NVIDIA…

May 13, 2026 · Abhishek Gore

NVIDIA's RTX Spark Is a Direct Shot at the PC Market, Backed by a Multi-Gen Roadmap That Past 'Windows on Arm' Bids Never Had

…NVIDIA developed the GB10 Superchip, which combines innovations from datacenters, such as NVFP4, CUDA, SLANG, TensorRT, vLLM, CX-7 NIC, NVLINK C2C, TMEM, and more, down to client platforms that utilize a…

Jun 1, 2026 · Hassan Mujtaba

NVIDIA Nemotron 3 Nano Omni: 단일 오픈 모델로 멀티모달 에이전트 추론을 가속화

…또한 FP8과 NVFP4 양자화 , 효율적인 비디오 샘플링, NVIDIA 최적화 커널을 지원해 예측 가능하고 지연 시간이 낮은 추론을 제공합니다. 여기에 3D 컨볼루션 기반 시공간 처리가 결합되면 워크스테이션부터 데이터센터, 클라우드 배포 환경까지 GPU 전반에서…

May 12, 2026 · Anjali Shah

Followed topics