Leoman777/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-striped_armored_gerbil

Leoman777/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-striped_armored_gerbil is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned from unsloth/Qwen2.5-0.5B-Instruct. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring structured reasoning, particularly in mathematical contexts, making it suitable for specialized applications where precise logical inference is crucial.

Warm
Public
0.5B
BF16
131072
Hugging Face