nather/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-gliding_tenacious_leopard

The nather/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-gliding_tenacious_leopard model is a 0.5 billion parameter instruction-tuned language model, fine-tuned from unsloth/Qwen2.5-0.5B-Instruct. It was trained using TRL and incorporates the GRPO method, which is designed to enhance mathematical reasoning. This model is suitable for tasks requiring instruction following and potentially benefits from improved mathematical capabilities due to its training methodology.

Warm
Public
0.5B
BF16
131072
Hugging Face

No reviews yet. Be the first to review!