open-r1/OpenR1-Distill-7B

OpenR1-Distill-7B is a 7.6 billion parameter GPT-like model, post-trained by open-r1 on a variant of Qwen/Qwen2.5-Math-7B with an extended RoPE base frequency for a 32k token context. It is specifically designed to replicate the reasoning capabilities of DeepSeek-R1 by distilling 350k verified reasoning traces across mathematics, coding, and science tasks. This model excels at step-by-step reasoning and is ideal for research in inference-time compute and reinforcement learning with verifiable rewards.

Warm
Public
7.6B
FP8
131072
License: apache-2.0
Hugging Face

No reviews yet. Be the first to review!