pavlodp/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-bristly_freckled_weasel

pavlodp/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-bristly_freckled_weasel is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned from unsloth/Qwen2.5-0.5B-Instruct. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is suitable for tasks requiring instruction following and potentially benefits from improved mathematical problem-solving due to its training methodology.

Warm

Public

Model Size: 0.5B

Quant: BF16

Ctx length: 131072

Hugging Face