hbx/JustRL-DeepSeek-1.5B
hbx/JustRL-DeepSeek-1.5B is a 1.5 billion parameter language model developed by hbx, fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B using a simplified Reinforcement Learning (RL) approach. This model demonstrates competitive performance on mathematical reasoning tasks with single-stage training and fixed hyperparameters, achieving state-of-the-art results at its scale. It is optimized for efficiency, matching or exceeding more complex methods with significantly less computational cost, making it suitable for resource-constrained mathematical problem-solving applications.
No reviews yet. Be the first to review!