Search

Showing top 18 results for "Qwen3"

Qwen3

Qwen3 is an AI model family developed by Alibaba, released as a set of large language models for natural-language tasks.

39 articles indexed Last updated 2d ago See topic hub

Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints | NVIDIA Technical Blog

Agentic AI / Generative AI Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints Feb 27, 2026 By Anu Srivastava Discuss (0) Discuss (0) L T F R E…

Feb 27, 2026 · Anu Srivastava

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog

…torchrun --nproc_per_node 2 /opt/NeMo/scripts/llm/gpt_prune.py \ --devices 2 \ --pp_size 2 \ --restore_path Qwen3-8B-nemo \ --legacy_ckpt \ --save_path Qwen3-8B-nemo-width-pruned \ --seq…

Oct 7, 2025 · Max Xu

NVIDIA In-Game Inferencing SDK

…CUDA-Enabled GPU or CPU Nemotron 3 Nano 4B Nemotron 2 Nano 9B Qwen3-0.6B Qwen3-4B Qwen3 8B RAG - Embed Local GGML CUDA-Enabled GPU or CPU E5 Large Unsupervised…

Build Next-Gen Physical AI with Edge‑First LLMs for Autonomous Vehicles and Robotics | NVIDIA Technical Blog

Developer Tools & Techniques Build Next-Gen Physical AI with Edge‑First LLMs for Autonomous Vehicles and Robotics NVIDIA TensorRT Edge‑LLM introduces support for MoEs, Cosmos Reason 2, and Qwen3-TTS/ASR…

Mar 12, 2026 · Lin Chai

Removing the Guesswork from Disaggregated Serving | NVIDIA Technical Blog

…deploying Qwen3-32B with NVFP4 quantization across 64 NVIDIA B200 GPUs, with target SLAs of 1000ms time-to-first-token (TTFT) and 15ms time-per-output-token (TPOT). Using a single command…

Mar 9, 2026 · Tianhao Xu

Building Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo | NVIDIA Technical Blog

…In this case, Qwen3-32B is the base reasoning modeling that is fine-tuned for telco NOC workflows using the following design principles: Focusing on a small number of high‑impact faults…

Mar 1, 2026 · Aiden Chang

Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron | NVIDIA Technical Blog

…Muon training performance on NVIDIA GB300 NVL72 Table 1 summarizes training throughput of the Kimi K2 and Qwen3 30B models with Muon and the AdamW optimizer on the NVIDIA GB300 NVL72 system…

Apr 22, 2026 · Hao Wu

NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design | NVIDIA Technical Blog

…The Qwen3-VL vision-language submission used the vLLM open source framework, showing how the community is rapidly building advanced multimodal optimizations to accelerate image-heavy inference workloads on the latest GPUs…

Apr 1, 2026 · Ashraf Eassa

Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT LLM | NVIDIA Technical Blog

…In tests with Llama 3.1 8B and Qwen3-8B, running at ~50% sparsity resulted in near-lossless accuracy across most tasks. Danger zone: Pushing sparsity beyond 60% often leads to sharp…

Dec 16, 2025 · Laikh Tewari

NVIDIA Data Center Deep Learning Product Performance

…80GB Qwen3 30B a3B 1.2 9,058 tokens/sec/gpu 16x H100 NVIDIA DGX H100 nemo:26.02.01 4096 1 2 1 8 FP8 1024 H100-SXM5-80GB Qwen3 235B…

Followed topics