Paper page - X2SAM: Any Segmentation in Images and Videos
…Hao Wang , , , , , , Abstract X2SAM is a unified multimodal model that extends segmentation capabilities from images to videos while supporting conversational instructions and visual prompts for both modalities. AI-generated summary Multimodal Large…