XiaomiMiMo/MiMo-V2-Flash
MiMo-V2-Flash is a 309B total parameter (15B active) Mixture-of-Experts (MoE) language model developed by XiaomiMiMo, designed for high-speed reasoning and agentic workflows. It features a novel hybrid attention architecture and Multi-Token Prediction (MTP) for efficient inference and long-context processing up to 256k tokens. The model excels in complex reasoning, code agent tasks, and general agent capabilities, achieving strong performance across various benchmarks.