XiaomiMiMo/MiMo-V2-Flash

MiMo-V2-Flash is a 309B total parameter (15B active) Mixture-of-Experts (MoE) language model developed by XiaomiMiMo, designed for high-speed reasoning and agentic workflows. It features a novel hybrid attention architecture and Multi-Token Prediction (MTP) for efficient inference and long-context processing up to 256k tokens. The model excels in complex reasoning, code agent tasks, and general agent capabilities, achieving strong performance across various benchmarks.

Warm

Public

Model Size: 310B

Quant: FP8

Ctx length: 32768

License: mit

Hugging Face