encoderrr/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-aquatic_pensive_eagle

The encoderrr/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-aquatic_pensive_eagle model is a 0.5 billion parameter instruction-tuned language model, fine-tuned from unsloth/Qwen2.5-0.5B-Instruct. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring robust reasoning, particularly in mathematical contexts, leveraging its 32768 token context length.

Warm
Public
0.5B
BF16
32768
Hugging Face

No reviews yet. Be the first to review!