nvidia/Llama-3.1-Nemotron-70B-Reward-HF

The nvidia/Llama-3.1-Nemotron-70B-Reward-HF is a 70 billion parameter reward model developed by NVIDIA, based on the Llama-3.1-70B-Instruct architecture. It is specifically designed to predict the quality of LLM-generated responses by assigning a reward score to assistant turns in English conversations up to 4,096 tokens. This model excels at evaluating response quality, making it suitable for applications requiring automated assessment of LLM outputs and for use in Reinforcement Learning from Human Feedback (RLHF).

Warm
Public
70B
FP8
32768
License: llama3.1
Hugging Face