amd/PARD-Qwen3-0.6B

The amd/PARD-Qwen3-0.6B model is a 0.8 billion parameter Qwen-based parallel draft model developed by AMD for accelerating Large Language Model (LLM) inference. It is designed for speculative decoding, offering significant speedups by adapting autoregressive draft models with low-cost training and high generalizability across different target models. This model is optimized to enhance LLM inference performance, achieving up to 4.08x speedup in optimized frameworks.

Warm
Public
0.8B
BF16
32768
License: mit
Hugging Face

No reviews yet. Be the first to review!