Pruning and Distilling LLMs Using NVIDIA TensorRT Model Optimizer | NVIDIA Technical Blog
…He brings full-stack GPU expertise spanning from chip design, CUDA and kernel-level development to server and cloud for model training and inference, translating innovations into real-world impact. Before NVIDIA…