yasserrmd/Coder-GRPO-3B

Coder-GRPO-3B is a 3 billion parameter instruction-tuned causal language model developed by yasserrmd, based on Qwen/Qwen2.5-3B-Instruct. It is fine-tuned using Group Relative Policy Optimization (GRPO) on the glaiveai/glaive-code-assistant dataset. This model excels at code reasoning and generation, producing short, correct programs and concise explanations. Its primary strength lies in high-signal code tasks such as writing, refactoring, explaining, and fixing code.

Warm

Public

Model Size: 3.1B

Quant: BF16

Ctx length: 32768

License: apache-2.0

Hugging Face