nvidia/Llama-3.1-Nemotron-70B-Reward-HF

The nvidia/Llama-3.1-Nemotron-70B-Reward-HF is a 70 billion parameter reward model developed by NVIDIA, based on the Llama-3.1-70B-Instruct architecture. It is specifically designed to predict the quality of LLM-generated responses by assigning a reward score to assistant turns in English conversations up to 4,096 tokens. This model excels at evaluating response quality, making it suitable for applications requiring automated assessment of LLM outputs and for use in Reinforcement Learning from Human Feedback (RLHF).

Warm

Public

Model Size: 70B

Quant: FP8

Ctx length: 32768

License: llama3.1

Hugging Face