Paper page - MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference
…replaces the dense token-wise indexing in sparse attention with a routed mixture-of-experts approach that reduces computational cost while maintaining performance and handling long contexts effectively. AI-generated summary DeepSeek…
.jpg)