miromind-ai/MiroThinker-4B-DPO-v0.2
MiroThinker-4B-DPO-v0.2 by miromind-ai is a 4 billion parameter open-source agentic model designed for complex, long-horizon problem solving. It integrates capabilities such as task decomposition, multi-hop reasoning, retrieval-augmented generation, code execution, web browsing, and document processing. This version features richer English and Chinese training data, unified DPO training, and an extended 40960-token context length, showing significant gains in research agent benchmarks like GAIA-Text-103 and BrowseComp-ZH.
No reviews yet. Be the first to review!