miromind-ai/MiroThinker-4B-DPO-v0.2

MiroThinker-4B-DPO-v0.2 by miromind-ai is a 4 billion parameter open-source agentic model designed for complex, long-horizon problem solving. It integrates capabilities such as task decomposition, multi-hop reasoning, retrieval-augmented generation, code execution, web browsing, and document processing. This version features richer English and Chinese training data, unified DPO training, and an extended 40960-token context length, showing significant gains in research agent benchmarks like GAIA-Text-103 and BrowseComp-ZH.

Warm
Public
4B
BF16
40960
License: apache-2.0
Hugging Face

No reviews yet. Be the first to review!