amd/PARD-Qwen3-0.6B
The amd/PARD-Qwen3-0.6B model is a 0.8 billion parameter Qwen-based parallel draft model developed by AMD for accelerating Large Language Model (LLM) inference. It is designed for speculative decoding, offering significant speedups by adapting autoregressive draft models with low-cost training and high generalizability across different target models. This model is optimized to enhance LLM inference performance, achieving up to 4.08x speedup in optimized frameworks.