nvidia/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct

The nvidia/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct is an 8 billion parameter instruction-tuned language model developed by NVIDIA, built upon the Llama-3.1 architecture. It is specifically designed for ultra-long context processing, supporting up to 1 million tokens while maintaining strong performance on standard benchmarks. This model excels at understanding and generating text over extensive sequences, making it suitable for applications requiring deep contextual comprehension.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 32768

License: cc-by-nc-4.0

Hugging Face