hbx/JustRL-Nemotron-1.5B

The hbx/JustRL-Nemotron-1.5B model is a 1.5 billion parameter language model developed by hbx, fine-tuned for mathematical reasoning tasks. It achieves state-of-the-art performance at its scale using a simplified Reinforcement Learning (RL) approach with single-stage training and fixed hyperparameters. This model demonstrates competitive results on various mathematical benchmarks with significantly reduced computational overhead compared to more complex multi-stage methods.

Cold

Public

Model Size: 1.5B

Quant: BF16

Ctx length: 131072

License: apache-2.0

Hugging Face

No reviews yet. Be the first to review!