nvidia/Qwen3-Nemotron-32B-GenRM-Principle
The nvidia/Qwen3-Nemotron-32B-GenRM-Principle is a 32 billion parameter large language model, built upon the Qwen3 foundation, specifically fine-tuned as a Generative Reward Model. It predicts the extent to which LLM-generated responses fulfill user-specified principles by assigning a reward score. This model achieves top performance on both the JudgeBench (81.4%) and RM-Bench (86.2%) benchmarks, making it highly effective for evaluating LLM response quality against defined criteria.