nather/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-gliding_tenacious_leopard

The nather/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-gliding_tenacious_leopard model is a 0.5 billion parameter instruction-tuned language model, fine-tuned from unsloth/Qwen2.5-0.5B-Instruct. It was trained using TRL and incorporates the GRPO method, which is designed to enhance mathematical reasoning. This model is suitable for tasks requiring instruction following and potentially benefits from improved mathematical capabilities due to its training methodology.

Warm

Public

Model Size: 0.5B

Quant: BF16

Ctx length: 131072

Hugging Face