Boosting Llama 3.1 405B Performance up to 1.44x with NVIDIA TensorRT Model Optimizer on NVIDIA H200 GPUs | NVIDIA Technical Blog
…FP8 recipe, developers with hardware resource constraints can use INT4 AWQ in TensorRT Model Optimizer to further compress the model. The INT4 AWQ technique reduces the required memory footprint significantly, enabling a…