RedHatAI/Meta-Llama-3-8B-Instruct-FP8-KV

RedHatAI/Meta-Llama-3-8B-Instruct-FP8-KV is an 8 billion parameter instruction-tuned causal language model, developed by RedHatAI, based on Meta-Llama-3. This model is specifically quantized to FP8 for both weights and activations, and includes FP8 KV Cache, optimizing it for efficient inference with vLLM. It maintains strong performance, achieving 74.98 on gsm8k 5-shot, making it suitable for resource-constrained environments requiring high throughput.

Warm

Public

Model Size: 8B

Quant: FP8

Ctx length: 8192

Hugging Face