arnuc/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-jumping_soft_ibis

The arnuc/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-jumping_soft_ibis model is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned from unsloth/Qwen2.5-0.5B-Instruct. It leverages the GRPO method, originally introduced for mathematical reasoning in DeepSeekMath, to enhance its capabilities. This model is specifically optimized for tasks benefiting from advanced reasoning techniques, making it suitable for complex problem-solving and instruction following.

Warm

Public

Model Size: 0.5B

Quant: BF16

Ctx length: 131072

Hugging Face

No reviews yet. Be the first to review!