Search

Showing top 18 results for "Qwen3"

Related topics: Qwen3

Tracked topic

Qwen3

Qwen3 is an AI model family developed by Alibaba, released as a set of large language models for natural-language tasks.

39 articles indexed Last updated 2d ago See topic hub

People also ask

How do pruning and distillation impact model performance?

Experimental results for pruning and distillation from Qwen3 8B using Model Optimizer show that Qwen3 Depth Pruned 6B model is 30% faster than the Qwen3 4B model, and it also performs better on the MMLU (Massive Multitask Language Understanding) benchmark. Depth pruning was applied to reduce the model from 36 to 24 layers, resulting in a 6B model, using one NVIDIA H100 80 GB HBM3. The Pruned model is distilled from Qwen3-8B using the OptimalScale/ClimbMix data processed from nvidia/ClimbMix pretraining dataset. The experiment uses 25% of the data, which is approximately 90B tokens. Distillatio

Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog