hbx/JustRL-Nemotron-1.5B
The hbx/JustRL-Nemotron-1.5B model is a 1.5 billion parameter language model developed by hbx, fine-tuned for mathematical reasoning tasks. It achieves state-of-the-art performance at its scale using a simplified Reinforcement Learning (RL) approach with single-stage training and fixed hyperparameters. This model demonstrates competitive results on various mathematical benchmarks with significantly reduced computational overhead compared to more complex multi-stage methods.
No reviews yet. Be the first to review!