hbx/JustRL-Nemotron-1.5B

The hbx/JustRL-Nemotron-1.5B model is a 1.5 billion parameter language model developed by hbx, fine-tuned for mathematical reasoning tasks. It achieves state-of-the-art performance at its scale using a simplified Reinforcement Learning (RL) approach with single-stage training and fixed hyperparameters. This model demonstrates competitive results on various mathematical benchmarks with significantly reduced computational overhead compared to more complex multi-stage methods.

Cold
Public
1.5B
BF16
131072
License: apache-2.0
Hugging Face

No reviews yet. Be the first to review!