nvidia/Qwen3-Nemotron-14B-BRRM

The nvidia/Qwen3-Nemotron-14B-BRRM is a 14 billion parameter Branch-and-Rethink Reasoning Reward Model developed by NVIDIA. This model implements a novel two-turn reasoning framework for evaluating LLM-generated responses, performing adaptive branching and branch-conditioned rethinking to focus on critical evaluation dimensions. It achieves state-of-the-art performance on major reward modeling benchmarks, making it suitable for integrating into RLHF pipelines to improve LLM response quality.

Cold
Public
14B
FP8
32768
License: nvidia-internal-scientific-research-and-development-model-license
Hugging Face

No reviews yet. Be the first to review!