nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Llama-3.1-Nemotron-70B-Instruct-HF is a 70 billion parameter instruction-tuned causal language model developed by NVIDIA, based on the Llama 3.1 architecture with a 32768 token context length. This model is specifically customized to enhance the helpfulness of LLM-generated responses, achieving top performance on automatic alignment benchmarks like Arena Hard, AlpacaEval 2 LC, and GPT-4-Turbo MT-Bench. It is optimized for general-domain instruction following and excels at generating coherent and factually correct responses.

Warm

Public

Model Size: 70B

Quant: FP8

Ctx length: 32768

License: llama3.1

Hugging Face