Claude, ChatGPT, and Gemini get all the hype, but the most interesting AI models are coming from elsewhere
… MiMo-V2-Flash interleaves Sliding Window Attention and Global Attention at a 5:1 ratio with an aggressive 128-token window, which cuts KV-cache storage by almost six times while keeping long-context performance intact via a learnable attention-sink bias. …