arnuc/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-jumping_soft_ibis

The arnuc/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-jumping_soft_ibis model is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned from unsloth/Qwen2.5-0.5B-Instruct. It leverages the GRPO method, originally introduced for mathematical reasoning in DeepSeekMath, to enhance its capabilities. This model is specifically optimized for tasks benefiting from advanced reasoning techniques, making it suitable for complex problem-solving and instruction following.

Warm
Public
0.5B
BF16
131072
Hugging Face

No reviews yet. Be the first to review!