RedHatAI/Qwen2-1.5B-Instruct-FP8

RedHatAI/Qwen2-1.5B-Instruct-FP8 is a 1.5 billion parameter Qwen2-based instruction-tuned causal language model developed by Neural Magic. This model is an FP8 quantized version of Qwen2-1.5B-Instruct, optimized for reduced memory footprint and faster inference with vLLM. It maintains 98.93% of the original model's average performance on the OpenLLM benchmark, making it suitable for English-language assistant-like chat applications.

Warm

Public

Model Size: 1.5B

Quant: BF16

Ctx length: 32768

License: apache-2.0

Hugging Face

No reviews yet. Be the first to review!