Leoman777/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-striped_armored_gerbil

Leoman777/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-striped_armored_gerbil is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned from unsloth/Qwen2.5-0.5B-Instruct. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring structured reasoning, particularly in mathematical contexts, making it suitable for specialized applications where precise logical inference is crucial.

Warm

Public

Model Size: 0.5B

Quant: BF16

Ctx length: 131072

Hugging Face