bourne321/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-quick_unseen_buffalo

bourne321/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-quick_unseen_buffalo is a 0.5 billion parameter instruction-tuned language model, fine-tuned from unsloth/Qwen2.5-0.5B-Instruct. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. With a notable context length of 131072 tokens, it is suitable for tasks requiring extensive contextual understanding, particularly those benefiting from improved mathematical processing.

Warm
Public
0.5B
BF16
131072
Hugging Face

No reviews yet. Be the first to review!