nvidia/Llama-3.1-Nemotron-Nano-8B-v1

The nvidia/Llama-3.1-Nemotron-Nano-8B-v1 is an 8 billion parameter large language model developed by NVIDIA, derived from Meta Llama-3.1-8B-Instruct. It is specifically post-trained for enhanced reasoning, human chat preferences, RAG, and tool calling, offering a balance of accuracy and efficiency. This model supports a 32,768 token context length and is optimized for deployment on a single RTX GPU, making it suitable for local use in AI agent systems and chatbots.

Warm
Public
8B
FP8
32768
License: nvidia-open-model-license
Hugging Face