Paper page - LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?
…Across a diverse set of benchmarks covering document understanding, OCR, and general VQA, LLaVA-UHD v4 reduces visual-encoding FLOPs by 55.8% while matching or even surpassing baseline performance. These results…