hbx/JustRL-DeepSeek-1.5B

hbx/JustRL-DeepSeek-1.5B is a 1.5 billion parameter language model developed by hbx, fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B using a simplified Reinforcement Learning (RL) approach. This model demonstrates competitive performance on mathematical reasoning tasks with single-stage training and fixed hyperparameters, achieving state-of-the-art results at its scale. It is optimized for efficiency, matching or exceeding more complex methods with significantly less computational cost, making it suitable for resource-constrained mathematical problem-solving applications.

Warm

Public

Model Size: 1.5B

Quant: BF16

Ctx length: 131072

License: apache-2.0

Hugging Face

No reviews yet. Be the first to review!