nvidia/Llama-3.1-Nemotron-Nano-8B-v1

The nvidia/Llama-3.1-Nemotron-Nano-8B-v1 is an 8 billion parameter large language model developed by NVIDIA, derived from Meta Llama-3.1-8B-Instruct. It is specifically post-trained for enhanced reasoning, human chat preferences, RAG, and tool calling, offering a balance of accuracy and efficiency. This model supports a 32,768 token context length and is optimized for deployment on a single RTX GPU, making it suitable for local use in AI agent systems and chatbots.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 32768

License: other

Hugging Face