miromind-ai/MiroThinker-4B-DPO-v0.2

MiroThinker-4B-DPO-v0.2 by miromind-ai is a 4 billion parameter open-source agentic model designed for complex, long-horizon problem solving. It integrates capabilities such as task decomposition, multi-hop reasoning, retrieval-augmented generation, code execution, web browsing, and document processing. This version features richer English and Chinese training data, unified DPO training, and an extended 40960-token context length, showing significant gains in research agent benchmarks like GAIA-Text-103 and BrowseComp-ZH.

Warm

Public

Model Size: 4B

Quant: BF16

Ctx length: 40960

License: apache-2.0

Hugging Face

No reviews yet. Be the first to review!