Blancy/Qwen3-1.7B-Open-R1-GRPO

Blancy/Qwen3-1.7B-Open-R1-GRPO is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B. It was trained using the GRPO method on the Blancy/1ktestfrom10kwithdifficultyclasses dataset, specializing in enhanced mathematical reasoning. With a substantial 40960 token context length, this model is optimized for complex problem-solving and detailed analytical tasks.

Warm
Public
2B
BF16
40960
Hugging Face