Paper page - Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception
…interior environments, consumption content, and social portraits. The benchmark supports three unified tasks within one standardized library: 🏷️ T1 Urban scene semantic classification 🔍 T2 Cross-modal image–text retrieval 🎯 T3 Instance segmentation…
