mlx-community/Llama-3.1-Nemotron-70B-Instruct-HF-bf16

The mlx-community/Llama-3.1-Nemotron-70B-Instruct-HF-bf16 is a 70 billion parameter instruction-tuned language model, converted to MLX format from NVIDIA's Llama-3.1-Nemotron-70B-Instruct-HF. This model leverages a 32768 token context length and is designed for general-purpose conversational AI and instruction following tasks. Its primary strength lies in its large parameter count and instruction-following capabilities, making it suitable for complex natural language understanding and generation.

Warm

Public

Model Size: 70B

Quant: FP8

Ctx length: 32768

License: llama3.1

Hugging Face