context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16

The Llama 3.2-3B-Instruct-FP16 model, developed by Meta, is a 3.21 billion parameter instruction-tuned multilingual large language model with a 32768 token context length. Optimized for multilingual dialogue, it excels in agentic retrieval, summarization, and chat applications. This model utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) and is fine-tuned using SFT and RLHF for helpfulness and safety, outperforming many open-source and closed chat models on industry benchmarks.

Warm

Public

Model Size: 3.2B

Quant: BF16

Ctx length: 32768

License: llama3.2

Hugging Face