yasserrmd/Coder-GRPO-3B
Coder-GRPO-3B is a 3 billion parameter instruction-tuned causal language model developed by yasserrmd, based on Qwen/Qwen2.5-3B-Instruct. It is fine-tuned using Group Relative Policy Optimization (GRPO) on the glaiveai/glaive-code-assistant dataset. This model excels at code reasoning and generation, producing short, correct programs and concise explanations. Its primary strength lies in high-signal code tasks such as writing, refactoring, explaining, and fixing code.