RedHatAI/Qwen2-1.5B-Instruct-FP8
RedHatAI/Qwen2-1.5B-Instruct-FP8 is a 1.5 billion parameter Qwen2-based instruction-tuned causal language model developed by Neural Magic. This model is an FP8 quantized version of Qwen2-1.5B-Instruct, optimized for reduced memory footprint and faster inference with vLLM. It maintains 98.93% of the original model's average performance on the OpenLLM benchmark, making it suitable for English-language assistant-like chat applications.
No reviews yet. Be the first to review!